Check if Url refers to file or DIrectory. (HTTP)

Check if Url refers to file or DIrectory. (HTTP) - java

How can I determine if an url is referring to a file or directory. the link http://example.com/test.txt should return that it is a file an http://example.com/dir/ is a directory
I know you can do this with the Uri class but this objects IsFile function only works with the file:/// scheme. And I am working with the http:// scheme. Any idea's?
Thanks

I'm not sure why this question was left unanswered/neglected for a long time. I faced the same situation in server-side java (reckon it would be similar for Android flavour). The only input information is a URL to the resource and we need to tell if the resource is a directory or file. So here's my solution.
if ("file".equals(resourceUrl.getProtocol())
&& new File(resourceUrl.toURI()).isDirectory()) {
// it's a directory
}
Hope this helps the next reader.
Note: please see #awwsmm comment. It was my assumption when provided the answer above. Basically, it doesn't make sense to test if a remote resource is a directory or anything. It is totally up to the site to decide what to return for each request.

It won't work because protocol would be http:// not file://.
class TestURL{
public static boolean isDirectory(URL resourceUrl){
if ("file".equals(resourceUrl.getProtocol())
&& new File(resourceUrl.toURI()).isDirectory()) {
true;
}
return false;
}
public static void main(String[] args){
System.out.println(TestURL.isDirectory("http://example.com/mydir/"));
}
}

I think a directory http://example.com/some/dir will redirect to http://example.com/some/dir/, a file will not.
ie one can examine the http Location field in the HEAD response:
$ curl -I http://example.com/some/dir | grep Location
Location: http://example.com/some/dir/

In my case I needed to download a file with HTTPS and there was occasions that server was misconfigured to redirect the request and thus would download only http data.
In my case I was able to "decide" whether the requested resource was file or not was inspecting "content type" / aka MIME type.
String[] allowedMimeTypes = new String[] { "application/octet-stream", "image/gif",
"text/css", "text/csv", "text/plain", "text/xml" };
URL website = new URL("https://localhost/public/" + file);
HttpsURLConnection huc = (HttpsURLConnection) website
.openConnection();
if (!Arrays.asList(allowedMimeTypes).contains(huc.getContentType())) {
throw new Exception("Not a file...");
}
When the response contained text/html;charset=utf-8 or similar I was able to determine that it was indeed not a file.
Also note that these MIME types are usually quite configurable on server side.

Related

Why my HTTPS file download corrupts .zip files?

I'm trying to download zip files from internet using following code:
public void getFile(String updateURL) throws Exception {
URL url = new URL(updateURL);
HttpURLConnection httpsConn = (HttpURLConnection) url.openConnection();
httpsConn.setRequestMethod("GET");
TrustModifier.relaxHostChecking(httpsConn);
int responseCode = httpsConn.getResponseCode();
if (responseCode == HttpsURLConnection.HTTP_OK) {
String fileName = "fileFromNet";
try (FileOutputStream outputStream = new FileOutputStream(fileName)) {
ReadableByteChannel rbc = Channels.newChannel(httpsConn.getInputStream());
outputStream.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
}
}
httpsConn.disconnect();
}
TrustModifier is a class used to solve the "trust issue": http://www.obsidianscheduler.com/blog/ignoring-self-signed-certificates-in-java/
The code above works well for zip files available via plain http or for non compressed files exposed via https but but if I try to download a zip file exposed via https endpoint only a small fragment of original file will be downloaded. I have tested with different download links from internet and always got the same result.
Does anybody has an idea what I've been doing wrong here?
Thank you.

transferFrom() must be called in a loop until the transfer is complete, and in this case the only way you can know that is by adding up the return values of transferFrom() until they equal the Content-length of the HTTP response.

Actually the problem was in the TrustModifier Class I was using to switch off the servier certificate check. Once I removed it because I didn't need it any longer (I took the certificate from server and put it in a local trust store), my problem was solved.

How to check given domain name http or https in java?

My problem
In my android application I get url input from user, like "www.google.com".
I want to find out for the given url whether to use http or https.
What I have tried
after referring to some Stack Overflow questions I tried with getScheme()
try {
String url_name="www.google.com";
URI MyUri = new URI(url_name);
String http_or_https="";
http_or_https=MyUri.getScheme();
url_name=http_or_https+"://"+urlname;
Log.d("URLNAME",url_name);
}
catch (Exception e) {
e.printStackTrace();
}
But my above code throws an exception.
My question
Is above approach getScheme() correct or not?
If above approach is incorrect, how to find url http or https?

A domain name is a domain name, it has nothing to do with protocol. The domain is the WHERE, the protocol is the HOW.
The domain is the location you want to go, the protocol is how do you go there, by bus, by plane, by train or by boat. It makes no sense to ask 'I want to go to the store, how do I ask the store if I should go by train or by car?'
The reason this works in browsers is that the browser usually tries to connect using both http and https if no protocol is supplied.
If you know the URL, do this:
public void decideProtocol(URL url) throws IOException {
if ("https".equals(url.getProtocol())) {
// It is https
} else if ("http".equals(url.getProtocol())) {
// It is http
}
}

You can check, the given url is http or https by using URLUtils.
URLUtil.isHttpUrl(String url); returns True if the url is an http.
URLUtil.isHttpsUrl(String url); returns True if the url is an https.

You could use the Apache UrlValidator.
You can specify the allowed url schema and in your case the code look something like this:
String[] schema = {"https"};
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("http://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
if (urlValidator.isValid("https://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}

A hierarchical URI is subject to further parsing according to the syntax
[scheme:][//authority][path][?query][#fragment]
so your url_name lack of scheme.
if url_name is "https://www.google.com", so the scheme is https.
refer： http://docs.oracle.com/javase/7/docs/api/java/net/URI.html

You can't get it from just URL. It doesn't make sense. Some websites can work with both http & https. It depends upon website itself whether they use SSL certificate or not.

As was pointed out, if a user didn't dare to provide full url including protocol type. The app might use kind of trial and error approach, try to establish connection using the list of available protocols. (http, https). The successful hit might be considered as a default. Again, all this about usability,this method is better than just annoying an inattentive user with ugly error message.

ranjith- I will store some sort of mapping between URI (main domain) and preferred protcol within application..can have pre-defined mappings based on what you know and then let that mapping grow as more Uris added..

Determine that URL will / have been redirected to a different URL programmatically

I need to download a bunch of apks from a market. Analyze each file if it contains ads inside using Multi-apk tool and compress it again (This is not the issue so I won'd discuss it any further)
Right now, I'm doing the part wherein I will download apk from a particular market. I get the source code of the page and get all URLS that ends with .APK
After each URL is extracted from the page, I download it using commons io. However, whenever a download is complete, I noticed that all of them is the same in size. Which makes me think that it is downloading the same file.
Later on, I realized that I am being redirected to a different URL.
What I want to do is to determine if the URL is redirecting me to a different URL.
I tried getting the response code of that URL, it's giving me 200 but when I'm already connected, it will redirect me later on to a different page.
Here's the code that I used to try to check the response code of the URL.
boolean urlValidity =true;
try
{
URLConnection urlConn = new URL(url).openConnection();
String RC = urlConn.getHeaderField(0);
System.out.println("RC " + RC);
if ( (RC == null) || (!(RC.contains("200"))) )
{
urlValidity = false;
}
}
catch(Exception e)
{
return false;
}
return urlValidity;
I need to do this so I wont need to continue to download the APK of that URL.
Is there a way to fix this programmatically?

You should set the setInstanceFollowRedirects to false to ensure you dont get redirected. This will also download very less content from the connection that you make.
I recently had a project where I had to find how many hops does a URL make to reach the final URL and could use the LongURL API.

How to know the file served in a web request

My question is clear, in a web server request without a file name like
http://whatever.com/path_1/path_b/
when the web server response is obtained with the "welcome-file-list" (in the web.xml of Tomcat app) with the default possible pages (deafult.html, index.html, etc.).
Is there a way of knowing wich exactly file, or the file name, is server by the web server?
I need to check the file in a filter, but in the request and/or response objects doesn't seem to be available
Thanks

No you can't. the purpose of hiding the physical details of file and its access behind a web-server is to provide a streamlined and secured access to the resources of a server.

Well, I've tried this solution, but I don't know if it is too much "resource-consuming".
In my servlet filter, check the welcome file list of files for the first existing one.
String fullURL = httpRequest.getRequestURL().toString();
String realPath = this.servletContext.getRealPath(filePath);
// if path is not a filename
if (fullURL.endsWith("/")){
// welcome file list
for (String wf : (String[])servletContext.getAttribute("org.apache.catalina.WELCOME_FILES")) {
// first encountered - first used
if ((new File(realPath+"\\"+wf)).exists()) {
// el primero que exista será el utilizado
fullURL += wf;
realPath += "\\"+wf;
break;
}
}
}

redirecting between java servlets from url containing #

Hey,
Maybe the title is not the best choice, but I really don't know how to better describe the problem.
The thing is when you point your browser to url that contains #
http://anydomain.com/test/elsem/1234#dogeatdog
and for some reason (ie. there is a business logic) you want to redirect to other page
http://anydomain.com/test/els/1234
the #dogeatdog will be added to new url.
I found this behavior while developing wicket app, but just now I tested it with simple pure java servlet. Can someone explain it to me?
Here is the code just in case I'm doing something wrong:
private void process(HttpServletRequest req, HttpServletResponse res)
{
res.setContentType("text/plain");
try
{
HttpSession session = req.getSession();
Object as = session.getAttribute("as");
if (as == null)
{
log.info("redirecting");
session.setAttribute("as", 1);
res.sendRedirect("/test/");
}
else
{
log.info("writing");
PrintWriter out = res.getWriter();
out.write("after redirect "+as);
out.flush();
}
}
catch (IOException e)
{
e.printStackTrace();
}
}

Hash fragments (#a_hash_fragment) never leave the browser, they are not part of HTTP request.
What the web server gets in this case is GET /test/elsem/1234, and it responds with redirect 3xx code and the new url /test/els/1234, which your browser picks and appends #dogeatdog. Makes sense now?
UPDATE: Thanks to Zack, here's a W3C document that exactly explains how this (should) work:
http://www.w3.org/Protocols/HTTP/Fragment/draft-bos-http-redirect-00.txt

From the sendRedirect Javadoc:
Sends a temporary redirect response to the client using the specified
redirect location URL. This method can accept relative URLs; the
servlet container must convert the relative URL to an absolute URL
before sending the response to the client. If the location is relative
without a leading '/' the container interprets it as relative to the
current request URI. If the location is relative with a leading '/'
the container interprets it as relative to the servlet container root.
Because of repetitive use of "relative" in the Javadoc, I suspect the new URL is using what it can from the old URL and then building from there...
In the brief amount of what I've read, forwarding should be used if possible instead of redirect.
See this for a good explanation of forward verses redirect.
See this for straight-forward examples of forwarding requests to Servlets or JSPs.
Of course, with forwarding, the original URL will remain intact so that may not be what you're looking for...
EDIT
With information from milan, I found some more information regarding URL fragments (the stuff after "#" - I didn't know that was their official name until corresponding with milan).
There's another SOF post that has some good information concerning this and possibly the best answer: URL Fragment and 302 redirects
I have "+1'd" milan for giving good direction on this...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Check if Url refers to file or DIrectory. (HTTP) - java

I think a directory http://example.com/some/dir will redirect to http://example.com/some/dir/, a file will not. ie one can examine the http Location field in the HEAD response: $ curl -I http://example.com/some/dir | grep Location Location: http://example.com/some/dir/

Related

Why my HTTPS file download corrupts .zip files?

How to check given domain name http or https in java?

Determine that URL will / have been redirected to a different URL programmatically

How to know the file served in a web request

redirecting between java servlets from url containing #

Categories

Resources