Recursively downloading a remote HTTP directory through java - java

I want to create a function to download a remote directory (Ex: "https://server.net/production/current/") via HTTP to a local folder. I don't have control over the remote directory so I can't just create a convenient tar ball. I was able to find lots of questions related to retrieving individual files, but I couldn't find one that matched my use case.
To give you an idea of what I am referring to, here is a sample of what the directory looks like in browser.
In other words I want to create a function equivalent to this wget where Y is the local destination folder and X is the remote directory to retrieve. I would call wget directly, but I want a cross-platform solution that will work on windows without additional setup.
wget -r -np -R "index.html*" -P Y X
The end goal is a java function like the one shown below.
/**
* Recursively downloads all of the files in a remote HTTPS directory to the local destination
* folder.
* #param remoteFolder a folder URL (Ex: "https://server.net/production/current/")
* #param destination a local folder (Ex: "C:\Users\Home\project\production")
*/
public static void downloadDirectory(String remoteFolder, String destination) {}
It can assume there are no circular dependencies in the remote directory and that the destination folder exists and is empty.

I was hoping there was some magic function or best practice in java.io or maybe Apache commons-io to do this, but since it sounds like none exists I wrote my own version that manually goes through the html page and follows links.
I'm just going to leave this answer here in case someone else has the same question or someone knows a way to improve my version.
import org.apache.commons.io.FileUtils;
private static final Pattern HREF_PATTERN = Pattern.compile("href=\"(.*?)\"");
/**
* Recursively downloads all of the files in a remote HTTPS directory to a local
* destination folder. This implementation requires that the destination string
* ends in a file delimiter. If you don't know if it does, append "/" to the end
* just to be safe.
*
* #param src remote folder URL (Ex: "https://server.net/production/current/")
* #param dst local folder to copy into (Ex: "C:\Users\Home\project\production\")
*/
public static void downloadDirectory(String src, String dst) throws IOException {
Scanner out = new Scanner(new URL(src).openStream(), "UTF-8").useDelimiter("\n");
List<String> hrefs = new ArrayList<>(8);
while (out.hasNext()) {
Matcher match = HREF_PATTERN.matcher(out.next());
if (match.find())
hrefs.add(match.group(1));
}
out.close();
for (String next : hrefs) {
if (next.equals("../"))
continue;
if (next.endsWith("/"))
copyURLToDirectory(src + next, dst + next);
else
FileUtils.copyURLToFile(new URL(src + next), new File(dst + next));
}
}

Related

Creating Java config files

What exactly is the best way to generate config files for my Java application? Right now I have a runnable jar which when opened will open a file selector gui for the user to select where the config files along with saved data should be stored at. I have my default config file saved in my resource folder and I am wanting to save that file to the location specified. Anyways the problem I am having is that I am not sure how I will be able to refence back to those files in the future because I only want that file selector to pop up once. As soon as the application is closed all references to that file pathway would be lost. The only thing I can think of is that I could replace my config.yml file inside the resource folder with the newly generated file and include a file location parameter (if that is even possible). But to be honest I am not sure how programs actually handle this and would love any insight into this topic.
Perhaps place the selected path into a small text file located within the Startup directory of your JAR file. Encrypt it if you like or whatever. The method below should provide you with the Directory (folder) path of where your JAR file was started:
/**
* Returns the path from where the JAR file was started. In other words, the
* installed JAR file's home directory (folder).<br><br>
*
* <b>Example Usage:</b><pre>
* <code>String applicationPath = appPath(MyStartupClassName.class);</code></pre>
*
* #param mainStartupClassName (Class) The main startup class for your
* particular Java application. To be supplied
* as: <b>MyMainClass.class</b><br>
*
* #return (String) The full path to where the JAR file resides (its Home
* Directory).
*/
public static String appPath(Class<?> mainStartupClassName) {
try {
String path = mainStartupClassName.getProtectionDomain().getCodeSource().getLocation().getPath();
String pathDecoded = java.net.URLDecoder.decode(path, "UTF-8");
pathDecoded = pathDecoded.trim().replace("/", File.separator);
if (pathDecoded.startsWith(File.separator)) {
pathDecoded = pathDecoded.substring(1);
}
return pathDecoded.substring(0, pathDecoded.lastIndexOf(File.separator));
}
catch (java.io.UnsupportedEncodingException ex) {
java.util.logging.Logger.getLogger("appPath()")
.log(java.util.logging.Level.SEVERE, null, ex);
}
return null;
}

How to have my java project to use some files without using their absolute path?

I have written a project where some images are used for the application's appearance and some text files will get created and deleted along the process. I only used the absolute path of all used files in order to see how the project would work, and now that it is finished I want to send it to someone else. so what I'm asking for is that how I can link those files to the project so that the other person doesn't have to set those absolute paths relative to their computer. something like, turning the final jar file with necessary files into a zip file and then that the person extracts the zip file and imports jar file, when runs it, the program work without any problems.
by the way, I add the images using ImageIcon class.
I'm using eclipse.
For files that you just want to read, such as images used in your app's icons:
Ship them the same way you ship your class files: In your jar or jmod file.
Use YourClassName.class.getResource or .getResourceAsStream to read these. They are not files, any APIs that need a File object can't work. Don't use those APIs (they are bad) - good APIs take a URI, URL, or InputStream, which works fine with this.
Example:
package com.foo;
public class MyMainApp {
public void example() {
Image image = new Image(MyMainApp.class.getResource("img/send.png");
}
public void example2() throws IOException {
try (var raw = MyMainApp.class.getResourceAsStream("/data/countries.txt")) {
BufferedReader in = new BufferedReader(
new InputStreamReader(raw, StandardCharsets.UTF_8));
for (String line = in.readLine(); line != null; line = in.readLine()) {
// do something with each country
}
}
}
}
This class file will end up in your jar as /com/foo/MyMainApp.class. That same jar file should also contain /com/foo/img/send.png and /data/countries.txt. (Note how starting the string argument you pass to getResource(AsStream) can start with a slash or not, which controls whether it's relative to the location of the class or to the root of the jar. Your choice as to what you find nicer).
For files that your app will create / update:
This shouldn't be anywhere near where your jar file is. That's late 80s/silly windows thinking. Applications are (or should be!) in places that you that that app cannot write to. In general the installation directory of an application is a read-only affair, and most certainly should not be containing a user's documents. These should be in the 'user home' or possibly in e.g. `My Documents'.
Example:
public void save() throws IOException {
Path p = Paths.get(System.getProperty("user.home"), "navids-app.save");
// save to that file.
}

Internal JAR uses files on the file system

I have a use case where I need to export this specific piece of code as a java library (which will be a JAR eventually) but the problem is that it needs to use some piece of information stored in physical files on the file system.
I have 2 questions here:
1) Where should I put these files on the filesystem (One option that I could think of was in the resources directory of the Java module containing the library: Have a doubt though that the resources directory also gets compiled into the jar?)
2) When I am using this library from an external Java application, how would the library be able to locate the files? Would they still be in the classpath?
You have two options, first one is to place the files inside the package structure, so that they will be packed inside the jar. You would get them from the code like this:
getClass().getResourceAsStream("/path/to/your/resource/resource.ext");
If you would call it from a static method of class named A then you should write like this:
A.class.getResourceAsStream("/path/to/your/resource/resource.ext");
The "/path" part of the path is the topmost package, and the resource.ext is your file name.
The other option is to put them outside the jar package, but then the jar needs to know their location:
provide it as an argument to the program (java -jar program.jar system/path/to/file)
hardcode the location from which you would read the file with paths
The way I undestood your queastion and answered it, it has nothing to do with classpath:
The CLASSPATH variable is one way to tell applications, including the JDK tools, where to look for user classes. (Classes that are part of the JRE, JDK platform, and extensions should be defined through other means, such as the bootstrap class path or the extensions directory.)
EDIT:
but you can nevertheless, put it there and get it from code like this:
System.getProperty("java.class.path");
It would however require some logic to parse it out.
You can pass the location of the files in a property file or some technique like this.
Where should I put these files on the filesystem
That is up to you to decide, though it would be a good idea to make this configurable. It would also be a good idea to try to fit into the conventions of the host operating system / distro, though these vary ... and depend on the nature of your application.
When I am using this library from an external Java application, how would the library be able to locate the files?
You would typically use a configuration property or initialization parameter to hold/pass the location. If you were writing an application rather that a library, you could use the Java Preferences APIs, though this probably a poor choice for a library.
Would they still be in the classpath?
Only if you put the location on the classpath ... and that is going to make configuration more tricky. Given that these files are required to be stored in the file system, I'd recommend using FileInputStream or similar.
Using Eclipse, I always create a package 'resources' where I put the files the jar needs. I access the files (from pretty much anywhere) through
this.getClass().getClassLoader().getResources("/resources/file.ext");
With export->runnable jar all those files are included in the .jar. I'm not sure this is the correct way of doing it though. Also, I'm not 100% sure about the "/" before resources, maybe it should be omitted.
I found a relevant answer as a part of another question : How to load a folder from a .jar?
I am able to successfully retrieve the files using the following code:
/**
* List directory contents for a resource folder. Not recursive.
* This is basically a brute-force implementation.
* Works for regular files and also JARs.
*
* #author Greg Briggs
* #param clazz Any java class that lives in the same place as the resources you want.
* #param path Should end with "/", but not start with one.
* #return Just the name of each member item, not the full paths.
* #throws URISyntaxException
* #throws IOException
*/
String[] getResourceListing(Class clazz, String path) throws URISyntaxException, IOException {
URL dirURL = clazz.getClassLoader().getResource(path);
if (dirURL != null && dirURL.getProtocol().equals("file")) {
/* A file path: easy enough */
return new File(dirURL.toURI()).list();
}
if (dirURL == null) {
/*
* In case of a jar file, we can't actually find a directory.
* Have to assume the same jar as clazz.
*/
String me = clazz.getName().replace(".", "/")+".class";
dirURL = clazz.getClassLoader().getResource(me);
}
if (dirURL.getProtocol().equals("jar")) {
/* A JAR path */
String jarPath = dirURL.getPath().substring(5, dirURL.getPath().indexOf("!")); //strip out only the JAR file
JarFile jar = new JarFile(URLDecoder.decode(jarPath, "UTF-8"));
Enumeration<JarEntry> entries = jar.entries(); //gives ALL entries in jar
Set<String> result = new HashSet<String>(); //avoid duplicates in case it is a subdirectory
while(entries.hasMoreElements()) {
String name = entries.nextElement().getName();
if (name.startsWith(path)) { //filter according to the path
String entry = name.substring(path.length());
int checkSubdir = entry.indexOf("/");
if (checkSubdir >= 0) {
// if it is a subdirectory, we just return the directory name
entry = entry.substring(0, checkSubdir);
}
result.add(entry);
}
}
return result.toArray(new String[result.size()]);
}
throw new UnsupportedOperationException("Cannot list files for URL "+dirURL);
}

Java Reading a file line per line

This is what my file looks like:
IDENTIFICATION::HARD::Should We appreciate Art?::Yes
MULTIPLECHOICE::HARD::Which of the FF is not an era of Art?::Bayutism::Digitalism,Somethingism,Retardism,Bayutism
IDENTIFICATION::HARD::What is Chris Browns Greatest Hit?::Rihanna
And I am reading the file like this
public void openQBankFile(){
try{
BufferedReader in = new BufferedReader(new FileReader(qbank.getAbsolutePath()));
String desc;
while((desc = in.readLine()) != null){
qbank_cont.add(desc);
}
in.close();
}catch(FileNotFoundException fnfe){
System.out.println("Question Repository Could Not Be Found");
return;
}catch(IOException ioe){
ioe.printStackTrace();
}
}
This where I get the contents of the arrayList
public static void main(String[] args){
CreateQuiz cq = new CreateQuiz(new File("./quiz/HUM101.quiz"),new File("./qbank/HUM101.qbank"));
cq.openQBankFile();
cq.filterQuestions(3, "HARD");
System.out.println(cq.qbank_cont.get(0));
}
And this is how I add it
public void filterQuestions(int numOfItems, String difficulty){
List<String> qt_diff = new ArrayList<String>();
for(int i = 0; i< qbank_cont.size();i++){
qt_diff.add(qbank_cont.get(i));
}
}
And I will store it inside an arrayList. but when I store it in arraylist it will just insert the whole text. Not Line per line. (I am using arrayList.get(0))
The root of your problem seems to be that you don't understand how Java deals with relative pathnames.
Here's what the File javadoc says:
On UNIX systems, a relative pathname is made absolute by resolving it against the current user directory. On Microsoft Windows systems, a relative pathname is made absolute by resolving it against the current directory of the drive named by the pathname, if any; if not, it is resolved against the current user directory.
The "current user directory" means the current directory that was in force when the application was launched. That depends on how the application was launched. For instance, if you launch from a command shell using java, the current directory will be the shell's current directory. But if you use a launch wrapper script, it may cd to somewhere else before launching the JVM. And so on.
But the bottom line is that if you are going to use relative paths, you need to have the right current directory.
Incidentally, when the JVM starts, the absolute path of the current directory is places in the System Properties object. You can read the property to find out what the current directory is, but changing the property does NOT change the way that relative paths are resolved by the File API and friends. AFAIK, there is no way that a pure Java application can reliably change its own current directory.

Getting a directory inside a .jar

I am trying to access a directory inside my jar file.
I want to go through every of the files inside the directory itself. I tried, for example, using the following:
URL imagesDirectoryURL=getClass().getClassLoader().getResource("Images");
if(imagesFolderURL!=null)
{
File imagesDirectory= new File(imagesDirectoryURL.getFile());
}
If I test this applet, it works well. But once I put the contents into the jar, it doesn't because of several reasons.
If I use this code, the URL always points outside the jar, so I have to put the Images directory there.
But if I use new File(imagesDirectoryURL.toURI());, it doesn't work inside the jar because I get the error URI not hierarchical. I am sure the directory exists inside the jar.
How am I supposed the get the contents of Images inside the jar?
Here is a solution which should work given that you use Java 7... The "trick" is to use the new file API. Oracle JDK provides a FileSystem implementation which can be used to peek into/modify ZIP files, and that include jars!
Preliminary: grab System.getProperty("java.class.path", "."), split against :; this will give you all entries in your defined classpath.
First, define a method to obtain a FileSystem out of a classpath entry:
private static final Map<String, ?> ENV = Collections.emptyMap();
//
private static FileSystem getFileSystem(final String entryName)
throws IOException
{
final String uri = entryName.endsWith(".jar") || entryName.endsWith(".zip"))
? "jar:file:" + entryName : "file:" + entryName;
return FileSystems.newFileSystem(URI.create(uri), ENV);
}
Then create a method to tell whether a path exists within a filesystem:
private static boolean pathExists(final FileSystem fs, final String needle)
{
final Path path = fs.getPath(needle);
return Files.exists(path);
}
Use it to locate your directory.
Once you have the correct FileSystem, use it to walk your directory using .getPath() as above and open a DirectoryStream using Files.newDirectoryStream().
And don't forget to .close() a FileSystem once you're done with it!
Here is a sample main() demonstrating how to read all the root entries of a jar:
public static void main(final String... args)
throws IOException
{
final Map<String, ?> env = Collections.emptyMap();
final String jarName = "/opt/sunjdk/1.6/current/jre/lib/plugin.jar";
final URI uri = URI.create("jar:file:" + jarName);
final FileSystem fs = FileSystems.newFileSystem(uri, env);
final Path dir = fs.getPath("/");
for (Path entry : Files.newDirectoryStream(dir))
System.out.println(entry);
}
Paths within Jars are paths, not actual directories as you can use them on a file system. To get all resources within a particular path of a Jar file:
Gain an URL pointing to the Jar.
Get an InputStream from the URL.
Construct a ZipInputStream from the InputStream.
Iterate each ZipEntry, looking for matches to the desired path.
..will I still be able to test my Applet when it's not inside that jar? Or will I have to program two ways to get my Images?
The ZipInputStream will not work with loose resources in directories on the file system. But then, I would strongly recommend using a build tool such as Ant to build (compile/jar/sign etc.) the applet. It might take an hour or so to write the build script & check it, but thereafter you can build the project by a few keystrokes and a couple of seconds.
It would be quite annoying if I always have to extract and sign my jar if I want to test my Aplet
I'm not sure what you mean there. Where does the 'extract' come into it? In case I was not clear, a sand-boxed applet can load resources this way, from any Jar that is mentioned in the archive attribute. Another thing you might do, is to separate the resource Jar(s) from the applet Jar. Resources typically change less than code, so your build might be able to take some shortcuts.
I think I really have to consider putting my Images into a seperate directory outside the jar.
If you mean on the server, there will be no practical way to get a listing of the image files short of help from the server. E.G. Some servers are insecurely set up to produce an HTML based 'file list' for any directory with no default file (such as an index.html).
I have only got one jar, in which my classes, images and sounds are.
OK - consider moving the sounds & images into a separate Jar. Or at the very least, put them in the Jar with 'no compression'. While Zip comression techniques work well with classes, they are less efficient at compressing (otherwise already compressed) media formats.
I have to sign it because I use the "Preferences" class to save user settings."
There are alternatives to the Preferences for applets, such as cookies. In the case of plug-in 2 architecture applet, you can launch the applet (still embedded in the browser) using Java Web Start. JWS offers the PersistenceService. Here is my small demo. of the PersistenceService.
Speaking of JWS, that brings me to: Are you absolutely certain this game would be better as an applet, rather than an app (e.g. using a JFrame) launched using JWS?
Applets will give you no end of stress, and JWS has offered the PersistenceService since it was introduced in Java 1.2.
You can use the PathMatchingResourcePatternResolver provided by Spring.
public class SpringResourceLoader {
public static void main(String[] args) throws IOException {
PathMatchingResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
// Ant-style path matching
Resource[] resources = resolver.getResources("/Images/**");
for (Resource resource : resources) {
System.out.println("resource = " + resource);
InputStream is = resource.getInputStream();
BufferedImage img = ImageIO.read(is);
System.out.println("img.getHeight() = " + img.getHeight());
System.out.println("img.getWidth() = " + img.getWidth());
}
}
}
I didn't do anything fancy with the returned Resource but you get the picture.
Add this to your maven dependency (if using maven):
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>3.1.2.RELEASE</version>
</dependency>
This will work directly from within Eclipse/NetBeans/IntelliJ and in the jar that's deployed.
Running from within IntelliJ gives me the following output:
resource = file [C:\Users\maba\Development\stackoverflow\Q12016222\target\classes\pictures\BMW-R1100S-2004-03.jpg]
img.getHeight() = 768
img.getWidth() = 1024
Running from command line with executable jar gives me the following output:
C:\Users\maba\Development\stackoverflow\Q12016222\target>java -jar Q12016222-1.0-SNAPSHOT.jar
resource = class path resource [pictures/BMW-R1100S-2004-03.jpg]
img.getHeight() = 768
img.getWidth() = 1024
I think you can directly access resources in ZIP/JAR file
Please see Tutorial its giving solution to your question
How to extract Java resources from JAR and zip archives
Hopes that helps
If I understand your problem you want to check the directory inside the jar and check all the files inside that directory.You can do something like:
JarInputStream jar = new JarInputStream(new FileInputStream("D:\\x.jar"));
JarEntry jarEntry ;
while(true)
{
jarEntry = jar.getNextJarEntry();
if(jarEntry != null)
{
if(jarEntry.isDirectory() == false)
{
String str = jarEntry.getName();
if(str.startsWith("weblogic/xml/saaj"))
{
anything which comes here are inside weblogic\xml\saaj directory
}
}
}
}
What you are looking for here might be the JarEntry list of the Jar... I had done some similar work during grad school... You can get the modified class here (http://code.google.com/p/marcellodesales-cs-research/source/browse/trunk/grad-ste-ufpe-brazil/ptf-add-on-dev/src/br/ufpe/cin/stp/global/filemanager/JarFileContentsLoader.java) Note that the URL contains an older Java class not using Generics...
This class returns a set of URLs with the protocol "jar:file:/" for a given token...
package com.collabnet.svnedge.discovery.client.browser.util;
import java.io.IOException;
import java.net.URL;
import java.util.Enumeration;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.jar.JarEntry;
import java.util.jar.JarFile;
public class JarFileContentsLoader {
private JarFile jarFile;
public JarFileContentsLoader(String jarFilePath) throws IOException {
this.jarFile = new JarFile(jarFilePath);
}
/**
* #param existingPath an existing path string inside the jar.
* #return the set of URL's from inside the Jar (whose protocol is "jar:file:/"
*/
public Set<URL> getDirEntries(String existingPath) {
Set<URL> set = new HashSet<URL>();
Enumeration<JarEntry> entries = jarFile.entries();
while (entries.hasMoreElements()) {
String element = entries.nextElement().getName();
URL url = getClass().getClassLoader().getResource(element);
if (url.toString().contains("jar:file")
&& !element.contains(".class")
&& element.contains(existingPath)) {
set.add(url);
}
}
return set;
}
public static void main(String[] args) throws IOException {
JarFileContentsLoader jarFileContents = new JarFileContentsLoader(
"/u1/svnedge-discovery/client-browser/lib/jmdns.jar");
Set<URL> entries = jarFileContents.getDirEntries("impl");
Iterator<URL> a = entries.iterator();
while (a.hasNext()) {
URL element = a.next();
System.out.println(element);
}
}
}
The output would be:
jar:file:/u1/svnedge-discovery/client-browser/lib/jmdns.jar!/javax/jmdns/impl/constants/
jar:file:/u1/svnedge-discovery/client-browser/lib/jmdns.jar!/javax/jmdns/impl/tasks/state/
jar:file:/u1/svnedge-discovery/client-browser/lib/jmdns.jar!/javax/jmdns/impl/tasks/resolver/
jar:file:/u1/svnedge-discovery/client-browser/lib/jmdns.jar!/javax/jmdns/impl/
jar:file:/u1/svnedge-discovery/client-browser/lib/jmdns.jar!/javax/jmdns/impl/tasks/
May the following code sample can help you
Enumeration<URL> inputStream = BrowserFactory.class.getClassLoader().getResources(".");
System.out.println("INPUT STREAM ==> "+inputStream);
System.out.println(inputStream.hasMoreElements());
while (inputStream.hasMoreElements()) {
URL url = (URL) inputStream.nextElement();
System.out.println(url.getFile());
}
IF you really want to treat JAR files like directories, then please have a look at TrueZIP 7. Something like the following might be what you want:
URL url = ... // whatever
URI uri = url.toURI();
TFile file = new TFile(uri); // File-look-alike in TrueZIP 7
if (file.isDirectory) // true for regular directories AND JARs if the module truezip-driver-file is on the class path
for (TFile entry : file.listFiles()) // iterate top level directory
System.out.println(entry.getPath()); // or whatever
Regards,
Christian

Categories

Resources