Why does URI allow missing protocol (while URL does not)?
In wikipedia Scheme (and even Path) seem to be obligatory components of an URI:
The URI generic syntax consists of a hierarchical sequence of five
components:[8]
URI = scheme:[//authority]path[?query][#fragment]
Or missing protocol defaults to something (like http)? I found nothing like this in the docs.
new URI("my.html"); // 1
new URI("xabc:my.html"); // 2
new URL("my.html"); // 3
new URL("xabc:my.html"); // 4
Concerning "obligatory" path - OK, there is oblique URI. But why missing protocol is allowed (it shall be present even for obligue URI which is required to be absolute)
I could understand that relative URL/URI don't require protocol (<img src="/images/pic.png">), but URL gives run-time java.net.MalformedURLException: no protocol in this case either (while URI don't).
Your relative path must be wrong,
Java's URI supports empty scheme for relative URI:
relative URI, that is, a URI that does not specify a scheme. Some examples of hierarchical URIs are:
docs/guide/collections/designfaq.html#28
Scheme is optional:
[scheme:]scheme-specific-part[#fragment]
Similar with URL, e.g.:
URL url = new URL("/guidelines.txt");
Related
What is a hierarchical URI strictly speaking?
Somewhere I see a definition that hierarchical URI must have scheme and path.
If hierarchical = not opaque, then hierarchical also shall have a scheme.
Can there be a hierarchichal URI without a scheme (for example a relative URI).
Yes, that is a hierarchical URI that is scheme and host relative (but path absolute). (On the assumption that the omitted scheme is a hierarchical one like "http://" or "file://")
See section 1.2.3 of RFC 3986 for more info https://www.ietf.org/rfc/rfc3986.txt , including the canonical definition of a hierarchical URI.
A relative reference (Section 4.2) refers to a resource by describing
the difference within a hierarchical name space between the reference
context and the target URI.
I was playing around with non stringy types for an application loader i've been developing. As a typo, I forgot to include the protocol part of a specific URI. I expected the java test to fail due to an invalid URI... however this statement seems to work...
URI uri = URI.create("contacts.addresses.genericAddress")
To me, theres no standard for using a dot as a scheme part... and I thought the scheme part was always required?
Does anyone know why?
I'll add my comment as an answer because I think it's correct:
From the Java URI documentation: "specified by the grammar in RFC 2396, Appendix A" and appendix A allows a URI to be a relative path, with no host name or scheme. So "this.and.that" might just be a file name like "this.html" (dot's are valid as a file element name -- i.e., pchars in a path segment).
Is there a clean and spec-conformant way to define a custom URL scheme that acts as an adapter on the resource returned by another URL?
I have already defined a custom URL protocol which returns a decrypted representation of a local file. So, for instance, in my code,
decrypted-file:///path/to/file
transparently decrypts the file you would get from file:///path/to/file. However, this only works for local files. No fun! I am hoping that the URL specification allows a clean way that I could generalize this by defining a new URL scheme as a kind of adapter on existing URLs.
For example, could I instead define a custom URL scheme decrypted: that could be used as an adapter that prefixes another absolute URL that retrieved a resource? Then I could just do
decrypted:file:///path/to/file
or decrypted:http://server/path/to/file or decrypted:ftp://server/path/to/file or whatever. This would make my decrypted: protocol composable with all existing URL schemes that do file retrieval.
Java does something similar with the jar: URL scheme but from my reading of RFC 3986 it seems like this Java technology violates the URL spec. The embedded URL is not properly byte-encoded, so any /, ?, or # delimiters in the embedded URL should officially be treated as segment delimiters in the embedding URL (even if that's not what JarURLConnection does). I want to stay within the specs.
Is there a nice and correct way to do this? Or is the only option to byte-encode the entire embedded URL (i.e., decrypted:file%3A%2F%2F%2Fpath%2Fto%2Ffile, which is not so nice)?
Is what I'm suggesting (URL adapters) done anywhere else? Or is there a deeper reason why this is misguided?
There's no built-in adaptor in Cocoa, but writing your own using NSURLProtocol is pretty straightforward for most uses. Given an arbitrary URL, encoding it like so seems simplest:
myscheme:<originalurl>
For example:
myscheme:http://example.com/path
At its simplest, NSURL only actually cares if the string you pass in is a valid URI, which the above is. Yes, there is then extra URL support layered on top, based around RFC 1808 etc. but that's not essential.
All that's required to be a valid URI is a colon to indicate the scheme, and no invalid characters (basically, ASCII without spaces).
You can then use the -resourceSpecifier method to retrieve the original URL and work with that.
When using ProducerTemplate.sendBodyAndHeader() to send a file using the "file" scheme to its destination, and the file path in the URI contains ampersands, it fails to deliver the file with the following errors.
org.apache.camel.ResolveEndpointFailedException: Failed to resolve endpoint:
file:///c%7C/IMM_SAN/Marketing/f77333bd-f96f-4873-b846-2f1dc5531a5a/2596/PB&J%20Generic%2007064782/transcoded/21726
due to: Failed to resolve endpoint:
file:///c%7C/IMM_SAN/Marketing/f77333bd-f96f-4873-b846-2f1dc5531a5a/25964/PB&J%20Generic%2007064782/transcoded/21726
due to: Invalid uri syntax: no ? marker however the uri has & parameter separators. Check the uri if its missing a ? marker.
Spending a few days trying the different overloads to send the file send(), sendBody(), sendBodyAndHeader() and even sendBodyAndHeaders().
I tried to UrlEncoder.encode() it before hand and of course a no go.
I even debugged the URISupport.normalizeUri(String uri) from the camel-core source and discovered something interesting. Apparently no amount of encoding will do me any good before sending the body and header because it appears to be doing its own encoding and it appears to be totally incorrect. I think this is a bug in sendBodyAndHeader(). It encodes the ampersand back into the URI before sending it. This is bad. Why are we doing that? We have an application that reads files from one department and are written to a share and another system automatically picking those file up and delivering it when processing on the file is finished.
See below camel URISupport.normalizeUri(String uri) method is encoding the URI here and this puts the ampersand back into the file path.
URI u = new URI(UnsafeUriCharactersEncoder.encode(uri));
So you see no amount of preprocessing on the file path in the URI is going to work at all because sendBodyAndHeader is going to do whatever it feels like doing. I would like to add a new overload to this API to turn off normalization and just send the URI as is. But wanted to check here to see if anybody has any less drastic options. Please note this is a problem when ampersands are in the URI path for file schemes.
ProducerTemplate prod = exchange.getContext().createProducerTemplate();
destPath = destPath.replace(':', '|');
destPath = destPath.replaceAll("\\\\", "/");
destPath = destPath.replaceAll("&", "%26"); // replace the ampersand
String query = "file:///" + destPath;
prod.sendBodyAndHeader(query, exchange.getIn().getBody(), Exchange.FILE_NAME, destFileName);
Use the CamelFileName header to avoid messing up the endpoint URI with the reserved character & if you really need that character in the file path.
This example would put a file into c:\a&b
public void sendAnyFile(Exchange e){
ProducerTemplate pt = getContext().createProducerTemplate();
pt.sendBodyAndHeader("file:///c:/",e.getIn().getBody(String.class), "CamelFileName", "a&b/hej.txt");
}
I have existing code that uses java.net.URL instead of java.net.URI all over the codebase.
Also, the code has URL parser that parses URLs appearing in some text body. All URLs that do not have a protocol prefix, such as www.google.com, are considered malformed when converting to URL object.
Is there a clean way to handle such cases in Java?
Create a URI and see if it has a scheme. Set the scheme, or reconstruct the URI with a scheme argument, if not present. Convert to URL.