How to normalise a URL in Java to remove the fragment. I.e. from https://www.website.com#something to https://www.website.com
This is possible with the URL.Normalize code, although in this specific use case I've only got a full absolute URL which needs to remain intact.
I'd like to be able to modify this code slightly to remove the fragment from the URL;
//The website below is just an example. In reality, this URL is unknown and could be anything. Both with and without a fragment depending on the use case
URL absUrl = new URL("https://www.website.com#something");
My thoughts so far is that this is only going to be possible by breaking down the URL into the Protocol + Domain + Path then joining it all back together which does appear to work, but there must be a more elegant way of doing this.
Fragment removal is fairly simple using the conversion methods toURI and toURL. So to convert a URL to a URI:
URL url = /*what have you*/ …
URI u = url.toURI();
To remove any fragment from the URI:
if( u.getFragment() != null ) { // Remake with same parts, less the fragment:
u = new URI( u.getScheme(), u.getSchemeSpecificPart(), /*fragment*/null ); }
In reconstructing a URI from its parts like that, it’s important to use the decoded getters (as shown), not the corresponding raw ones. For authority on this usage, see e.g. the Identity section of the API.
To convert the result back to a URL:
url = u.toURL();
Fragments do not exist as a separate entity in Java URLs. But you can convert a URL into a URI and back to remove a fragment. I did it like this:
URL url;
...
if (url.toString().contains("#")) {
URI uri = null;
try {
uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), null);
String file = "";
if (uri.getPath() != null) {
file += uri.getPath();
}
if (uri.getQuery() != null) {
file += uri.getQuery();
}
url = new URL(uri.getScheme(), uri.getHost(), uri.getPort(), file);
} catch (URISyntaxException e) {
...
} catch (MalformedURLException e) {
...
}
}
Related
I need to get host from this url
android-app://com.google.android.googlequicksearchbox?Pub_id={siteID}
java.net.URL and java.net.URI can't handle it.
The problem is in { and } characters which are not valid for URI. Looks like a placeholder that wasn't resolved correctly when creating a URI.
You can use String.replaceAll() to get rid of these two characters:
String value = "android-app://com.google.android.googlequicksearchbox?Pub_id={siteID}";
URI uri = URI.create(value.replaceAll("[{}]", ""));
System.out.println(uri.getHost()); // com.google.android.googlequicksearchbox
You see, eventually I need path, scheme and query.
I've just found super fast library for parsing such URLs. https://github.com/anthonynsimon/jurl
It's also very flexible.
You can try the following code
String url = "android-app://com.google.android.googlequicksearchbox?Pub_id={siteID}";
url = url.replace("{", "").replace("}","");
URI u;
try {
u = new URI(url);
System.out.println(u.getHost());
} catch (URISyntaxException e) {
e.printStackTrace();
}
I am having a textbox and submit button in my jsp page. When submitting this button with some url in textbox, I am getting the response of that url using URLConnection
String strUrl = request.getParameter("url");
URL url = new URL(strUrl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
String encoding = new String(encodedBytes);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
InputStream content = (InputStream) connection.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(content));
try {
fWriter = new FileWriter(new File("f:\\new.html"));
writer = new BufferedWriter(fWriter);
while ((line = in.readLine()) != null) {
String s = line.toString();
writer.write(s);
}
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
In the resulting html page, every css and js and images were missing as they are pointed to get from local.
for example, js is placed as followed in my generated html page.
<script src="/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
But this actual src is as follows,
<script src="https://www.url.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
I know that there are many solution to replace all src, href with url host. Found many answers related to that.
I used a solution as follows,
if (s.contains(("href="))) {
if (s.contains("\"../") || s.contains("\"/")) {
s = s.replace("\"../", "\"http://" + url.getHost() + "/");
s = s.replace("\"/", "\"http://" + url.getHost() + "/");
writer.write(s);
out.println(s);
}
}
Now I am able to get link,but its not useful in all the web sites. which means that it will helpful for only sites having that kind of host only prefix with src and hrefs.
In some websites, links are defined as href="frmArticles.aspx". In this case its not enough to add host with href url, because href and src are different even though I prefix with host. For example, folowing URL having href links as different than its URL.
http://www.nakkheeran.in/Users/frmMagazine.aspx?M=2
தை தை தை
If, I am adding host to this href it becomes as follows,
தை தை தை
And this is not available. Because, the actual url is
தை தை தை
There are essentially two ways to get the absolute URL:
Using Jsoup's abs:href attribute getter. It works like this:
Element a = myDoc.select("a").first(); //selects tue first link on the page, replace with whatever selector you need to get your link (a element)
String url = a.attr("abs:href"); //gets the absolute url of the link (href attribute)
Note that you need to provide Jsoup with the URL of the HTML document you are using, so it can resolve the URL correctly, this is done automatically if you use Jsoup.connect(myHtmlUrl).get(), if you are parsing HTML from a String or from a file, you need to provide it, use the appropriate Jsoup.parse() method which allows you to provide a base URL
The other way is with Java's built in URL class, which is probably what you should use in your case. You can use it like this:
String absoluteUrl = new URL(new URL("http://example.com/example.html"), "script.js")
Which would print:
http://example.com/script.js
To clarify a bit, the first parameter (in this case example.com) is the url your HTML document is from, and the second parameter ("script.js") is the URL found in your HTML.
In your case, you could use it like:
String absoluteUrl = new URL(new URL("https://www.url.com/"), "/ajax/libs/jquery/2.1.1/jquery.min.js")
Which will print:
https://www.url.com/ajax/libs/jquery/2.1.1/jquery.min.js
The URL class has a constructor URL(URL context, String url) that does what you tried doing with regexps.
Edit: In your case the context URL is the source URL of the parsed resource. Let's say you parse something from URL context = new URL("http://example.com/path/to/some.html#where?is+carmen+sandiego"). Then you just take the reference of any link and create a URL ref = new URL(context, src).
I have a text field to acquire location information (String type) from User. It could be file directory based (e.g. C:\directory) or Web url (e.g. http://localhost:8008/resouces). The system will read some predetermined metadata file from the location.
Given the input string, how can I detect the nature of the path location whether it is a file based or Web URL effectively.
So far I have tried.
URL url = new URL(location); // will get MalformedURLException if it is a file based.
url.getProtocol().equalsIgnoreCase("http");
File file = new File(location); // will not hit exception if it is a url.
file.exist(); // return false if it is a url.
I am still struggling to find a best way to tackle both scenarios. :-(
Basically I would not prefer to explicitly check the path using the prefix such as http:// or https://
Is there an elegant and proper way of doing this?
You can check if the location starts with http:// or https://:
String s = location.trim().toLowerCase();
boolean isWeb = s.startsWith("http://") || s.startsWith("https://");
Or you can use the URI class instead of URL, URI does not throw MalformedURLException like the URL class:
URI u = new URI(location);
boolean isWeb = "http".equalsIgnoreCase(u.getScheme())
|| "https".equalsIgnoreCase(u.getScheme())
Although new URI() may also throw URISyntaxException if you use backslash in location for example. Best way would be to either use prefix check (my first suggestion) or create a URL and catch MalformedURLException which if thrown you'll know it cannot be a valid web url.
If you're open to the use of a try/catch scenario being "elegant", here is a way that is more specific:
try {
processURL(new URL(location));
}
catch (MalformedURLException ex){
File file = new File(location);
if (file.exists()) {
processFile(file);
}
else {
throw new PersonalException("Can't find the file");
}
}
This way, you're getting the automatic URL syntax checking and, that failing, the check for file existence.
you can try:
static public boolean isValidURL(String urlStr) {
try {
URI uri = new URI(urlStr);
return uri.getScheme().equals("http") || uri.getScheme().equals("https");
}
catch (Exception e) {
return false;
}
}
note that this will return false for any other reason that invalidates the url, ofor a non http/https url: a malformed url is not necessarily an actual file name, and a good file name can be referring to a non exisiting one, so use it in conjunction with you file existence check.
public boolean urlIsFile(String input) {
if (input.startsWith("file:")) return true;
try { return new File(input).exists(); } catch (Exception e) {return false;}
}
This is the best method because it is hassle free, and will always return true if you have a file reference. For instance, other solutions don't and cannot cover the plethora of protocol schemes available such as ftp, sftp, scp, or any future protocol implementations. So this one is the one for all uses and purposes; with the caveat of the file must exist, if it doesn't begin with the file protocol.
if you look at the logic of the function by it's name, you should understand that, returning false for a non existent direct path lookup is not a bug, that is the fact.
So I was attempting to use this String in a URL :-
http://site-test.com/Meetings/IC/DownloadDocument?meetingId=c21c905c-8359-4bd6-b864-844709e05754&itemId=a4b724d1-282e-4b36-9d16-d619a807ba67&file=\\s604132shvw140\Test-Documents\c21c905c-8359-4bd6-b864-844709e05754_attachments\7e89c3cb-ce53-4a04-a9ee-1a584e157987\myDoc.pdf
In this code: -
String fileToDownloadLocation = //The above string
URL fileToDownload = new URL(fileToDownloadLocation);
HttpGet httpget = new HttpGet(fileToDownload.toURI());
But at this point I get the error: -
java.net.URISyntaxException: Illegal character in query at index 169:Blahblahblah
I realised with a bit of googling this was due to the characters in the URL (guessing the &), so I then added in some code so it now looks like so: -
String fileToDownloadLocation = //The above string
fileToDownloadLocation = URLEncoder.encode(fileToDownloadLocation, "UTF-8");
URL fileToDownload = new URL(fileToDownloadLocation);
HttpGet httpget = new HttpGet(fileToDownload.toURI());
However, when I try and run this I get an error when I try and create the URL, the error then reads: -
java.net.MalformedURLException: no protocol: http%3A%2F%2Fsite-test.testsite.com%2FMeetings%2FIC%2FDownloadDocument%3FmeetingId%3Dc21c905c-8359-4bd6-b864-844709e05754%26itemId%3Da4b724d1-282e-4b36-9d16-d619a807ba67%26file%3D%5C%5Cs604132shvw140%5CTest-Documents%5Cc21c905c-8359-4bd6-b864-844709e05754_attachments%5C7e89c3cb-ce53-4a04-a9ee-1a584e157987%myDoc.pdf
It looks like I can't do the encoding until after I've created the URL else it replaces slashes and things which it shouldn't, but I can't see how I can create the URL with the string and then format it so its suitable for use. I'm not particularly familiar with all this and was hoping someone might be able to point out to me what I'm missing to get string A into a suitably formatted URL to then use with the correct characters replaced?
Any suggestions greatly appreciated!
You need to encode your parameter's values before concatenating them to URL.
Backslash \ is special character which have to be escaped as %5C
Escaping example:
String paramValue = "param\\with\\backslash";
String yourURLStr = "http://host.com?param=" + java.net.URLEncoder.encode(paramValue, "UTF-8");
java.net.URL url = new java.net.URL(yourURLStr);
The result is http://host.com?param=param%5Cwith%5Cbackslash which is properly formatted url string.
I have the same problem, i read the url with an properties file:
String configFile = System.getenv("system.Environment");
if (configFile == null || "".equalsIgnoreCase(configFile.trim())) {
configFile = "dev.properties";
}
// Load properties
Properties properties = new Properties();
properties.load(getClass().getResourceAsStream("/" + configFile));
//read url from file
apiUrl = properties.getProperty("url").trim();
URL url = new URL(apiUrl);
//throw exception here
URLConnection conn = url.openConnection();
dev.properties
url = "https://myDevServer.com/dev/api/gate"
it should be
dev.properties
url = https://myDevServer.com/dev/api/gate
without "" and my problem is solved.
According to oracle documentation
Thrown to indicate that a malformed URL has occurred. Either no legal protocol could be found in a specification string or the string
could not be parsed.
So it means it is not parsed inside the string.
You want to use URI templates. Look carefully at the README of this project: URLEncoder.encode() does NOT work for URIs.
Let us take your original URL:
http://site-test.test.com/Meetings/IC/DownloadDocument?meetingId=c21c905c-8359-4bd6-b864-844709e05754&itemId=a4b724d1-282e-4b36-9d16-d619a807ba67&file=\s604132shvw140\Test-Documents\c21c905c-8359-4bd6-b864-844709e05754_attachments\7e89c3cb-ce53-4a04-a9ee-1a584e157987\myDoc.pdf
and convert it to a URI template with two variables (on multiple lines for clarity):
http://site-test.test.com/Meetings/IC/DownloadDocument
?meetingId={meetingID}&itemId={itemID}&file={file}
Now let us build a variable map with these three variables using the library mentioned in the link:
final VariableMap = VariableMap.newBuilder()
.addScalarValue("meetingID", "c21c905c-8359-4bd6-b864-844709e05754")
.addScalarValue("itemID", "a4b724d1-282e-4b36-9d16-d619a807ba67e")
.addScalarValue("file", "\\\\s604132shvw140\\Test-Documents"
+ "\\c21c905c-8359-4bd6-b864-844709e05754_attachments"
+ "\\7e89c3cb-ce53-4a04-a9ee-1a584e157987\\myDoc.pdf")
.build();
final URITemplate template
= new URITemplate("http://site-test.test.com/Meetings/IC/DownloadDocument"
+ "meetingId={meetingID}&itemId={itemID}&file={file}");
// Generate URL as a String
final String theURL = template.expand(vars);
This is GUARANTEED to return a fully functional URL!
Thanks to Erhun's answer I finally realised that my JSON mapper was returning the quotation marks around my data too! I needed to use "asText()" instead of "toString()"
It's not an uncommon issue - one's brain doesn't see anything wrong with the correct data, surrounded by quotes!
discoveryJson.path("some_endpoint").toString();
"https://what.the.com/heck"
discoveryJson.path("some_endpoint").asText();
https://what.the.com/heck
This code worked for me
public static void main(String[] args) {
try {
java.net.URL url = new java.net.URL("http://path");
System.out.println("Instantiated new URL: " + url);
}
catch (MalformedURLException e) {
e.printStackTrace();
}
}
Instantiated new URL: http://path
Very simple fix
String encodedURL = UriUtils.encodePath(request.getUrl(), "UTF-8");
Works no extra functionality needed.
I'm in doubts am on right way for one of my methods. This one is a compositor of URI I'm using for my http requests. It just takes data from a separate static class with final Strings, merges them together and includes received from the server token where necessary. It happened all URIs are with token and the only one, authentication, without it. What I've done:
private URI urlComposer(String apiUri, String token) {
URI uri = null;
try {
if(apiUri.equals("POST_AUTH_URL")) {
uri = URIUtils.createURI(null, MyConfig.WEB_SERVER, -1, apiUri, null, null);
return uri;
}
String tmp = apiUri.toString();
String[] array = tmp.split("<token>");
tmp = array[0] + auth.getToken() + array[1];
uri = URIUtils.createURI(null, MyConfig.WEB_SERVER, -1, tmp, null, null);
if (MyConfig.DEBUG) Log.d("Dev", "Constructed url " + uri);
return uri;
} catch (URISyntaxException e) {
if (MyConfig.DEBUG) Log.d("Dev", "urlComposer was unable to construct a URL");
e.printStackTrace();
}
return null;
}
Trying to look in the future I don't like the idea to generate more if/else statements if I would have more special cases like this POST_AUTH_URL. One one hand I want the only one method to be called to construct a URI, on the other I don't want these ifs. What shall I do?
If you always set URIs, only different ones, you can use a map:
uriMap.put("POST_AUTH_URL", URIUtils.createURI(null, MyConfig.WEB_SERVER, -1, apiUri, null, null));
You can access this map later:
uri = uriMap.get(apiUri);
You could theoretically create an enum URI_TYPE with method createURI (it is difficult to derive the actual parameters from the provided code snippet). This way you'd simply invoke this method on a specific enum value, which would have its own specific implementation.
Pls refer this and this for more information. Hope it helps.