Getting the main url with java - java

I need to get the main url of a link, for example instead of
https://stackoverflow.com/questions/ask
i want just
https://stackoverflow.com/
I know you can use URI but for some reason it doesn't work well with my program so I need an alternative, I tried this with uri but as said i need another solution (maybe regex?)
URI uri = new URI("the url");
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;

As simple as you can do is
String url="http://stackoverflow.com/questions/ask";
URI uri = null;
uri = new URI(url);
String host = uri.getHost();
System.out.println(uri.getScheme()+"://"+host+"/");
Result:
http://stackoverflow.com/

Related

Remove Everything Befor Third Forward Slash

I have the following strings:
http://somedomain.com/dir/sub/folder/file.txt
OR
https://10.0.0.1/dir/sub/folder/another_folder/file.txt
I want to remove everything before the third forward slash (remove the domain) and still keep the third forward slash.
Expected results:
/dir/sub/folder/file.txt
OR
/dir/sub/folder/another_folder/file.txt
Uri uri = Uri.parse("https://graph.facebook.com/me/home?limit=25&since=1374196005");
String protocol = uri.getScheme();
String server = uri.getAuthority();
String path = uri.getPath();
Set<String> args = uri.getQueryParameterNames();
String limit = uri.getQueryParameter("limit");
I think you need a path value
You can use the URL class that is what you are looking for.
The URL class provides several methods that let you query URL objects. You can get the protocol, authority, host name, port number, path, query, filename, and reference from a URL using these accessor methods
Use this :
URL aURL = new URL("https://10.0.0.1/dir/sub/folder/another_folder/file.txt");
aUrl.getPath();
Output result
path = /dir/sub/folder/another_folder/file.txt

get only the website name from domain name of url

I need to convert a list of URLS to their host name. SO tried the below mentioned code:
URL netUrl = new URL(url);
String host = netUrl.getHost();
The above mentioned code is producing output as shown below:
a95-101-128-242.deploy.akamaitechnologies.com
a23-1-242-192.deploy.static.akamaitechnologies.com
edge-video-shv-01-lht6.fbcdn.net
I want only the website name from the above output like as shown below:
akamaitechnologies
akamaitechnologies
fbcdn
Please someone help.
Thanks
If you want to parse a URL, use java.net.URI. java.net.URL has a bunch of problems -- its equals method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.
public static String getDomainName(String url) throws URISyntaxException {
URI uri = new URI(url);
String domain = uri.getHost();
return domain.startsWith("www.") ? domain.substring(4) : domain;
}
This should work.

How to extract the relative url from the absolute url in Java

I have this website:
https://asd.com/somestuff/another.html
and I want to extract the relative part out of it:
somestuff/another.html
How do I do that?
EDIT: I was offered an answer to a question, but the problem there was to build the absolute url out of the relative which is not what I'm interested in.
You could use the getPath() method of the URL object:
URL url = new URL("https://asd.com/somestuff/another.html");
System.out.println(url.getPath()); // prints "/somestuff/another.html"
Now, this only brings the actual path. If you need more information (the anchor or the parameters passed as get values), you need to call other accessors of the URL object:
URL url = new URL("https://asd.com/somestuff/another.html?param=value#anchor");
System.out.println(url.getPath()); // prints "/somestuff/another.html"
System.out.println(url.getQuery()); // prints "param=value"
System.out.println(url.getRef()); // prints "anchor"
A possible use to generate the relative URL without much code, based on Hiru's answer:
URL absolute = new URL(url, "/");
String relative = url.toString().substring(absolute.toString().length());
System.out.println(relative); // prints "somestuff/another.html?param=value#anchor"
if you know that the domain will always be .com then you can try something like this:
String url = "https://asd.com/somestuff/another.html";
String[] parts = url.split(".com/");
//parts[1] is the string after the .com/
The URL consists of the following elements (note that some optional elements are omitted):
1) scheme
2) hostname
3) [port]
4) path
5) query
6) fragment
Using the Java URL API, you can do the following:
URL u = new URL("https://randomsite.org/another/randomPage.html");
System.out.println(u.getPath());
Edit#1
Seeing Chop's answer, in case you have query elements in your URL, such as
?name=foo&value=bar
Using the getQuery() method will not return the resource path, just the query part.
You can do this using below snippet.
String str="https://asd.org/somestuff/another.html";
if(str.contains("//")) //To remove any protocol specific header.
{
str=str.split("//")[1];
}
System.out.println(str.substring(str.indexOf("/")+1)); // taking the first '/'
Try This
Use it Globally not only for .com
URL u=new URL("https://asd.in/somestuff/another.html");
String u1=new URL(u, "/").toString();
String u2=u.toString();
String[] u3=u2.split(u1);
System.out.println(u3[1]); //it prints: somestuff/another.html
My solution based on java.net.URI
URI _absoluteURL = new URI(absoluteUrl).normalize();
String root = _absoluteURL.getScheme() + "://" + _absoluteURL.getAuthority();
URI relative = new URI(root).relativize(_absoluteURL);
String result = relative.toString();
Consider using Apache Commons VFS...
import org.apache.commons.vfs2.FileSystemException;
import org.apache.commons.vfs2.VFS;
import org.apache.commons.vfs2.impl.StandardFileSystemManager;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.net.URLStreamHandlerFactory;
public class StudyURI {
public static void main(String[] args) throws URISyntaxException, FileSystemException {
StandardFileSystemManager fileSystemManager = (StandardFileSystemManager) VFS.getManager();
URLStreamHandlerFactory factory = fileSystemManager.getURLStreamHandlerFactory();
URL.setURLStreamHandlerFactory(factory);
URI baseURI = fileSystemManager.resolveFile("https://asd.com/").getURI();
URI anotherURI =fileSystemManager.resolveFile("https://asd.com/somestuff/another.html").getURI();
String result = baseURI.relativize(anotherURI).getPath();
System.out.println(result);
}
}
Maybe you need to add module to run the code:
https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient

java.net.MalformedURLException: no protocol on URL based on a string modified with URLEncoder

So I was attempting to use this String in a URL :-
http://site-test.com/Meetings/IC/DownloadDocument?meetingId=c21c905c-8359-4bd6-b864-844709e05754&itemId=a4b724d1-282e-4b36-9d16-d619a807ba67&file=\\s604132shvw140\Test-Documents\c21c905c-8359-4bd6-b864-844709e05754_attachments\7e89c3cb-ce53-4a04-a9ee-1a584e157987\myDoc.pdf
In this code: -
String fileToDownloadLocation = //The above string
URL fileToDownload = new URL(fileToDownloadLocation);
HttpGet httpget = new HttpGet(fileToDownload.toURI());
But at this point I get the error: -
java.net.URISyntaxException: Illegal character in query at index 169:Blahblahblah
I realised with a bit of googling this was due to the characters in the URL (guessing the &), so I then added in some code so it now looks like so: -
String fileToDownloadLocation = //The above string
fileToDownloadLocation = URLEncoder.encode(fileToDownloadLocation, "UTF-8");
URL fileToDownload = new URL(fileToDownloadLocation);
HttpGet httpget = new HttpGet(fileToDownload.toURI());
However, when I try and run this I get an error when I try and create the URL, the error then reads: -
java.net.MalformedURLException: no protocol: http%3A%2F%2Fsite-test.testsite.com%2FMeetings%2FIC%2FDownloadDocument%3FmeetingId%3Dc21c905c-8359-4bd6-b864-844709e05754%26itemId%3Da4b724d1-282e-4b36-9d16-d619a807ba67%26file%3D%5C%5Cs604132shvw140%5CTest-Documents%5Cc21c905c-8359-4bd6-b864-844709e05754_attachments%5C7e89c3cb-ce53-4a04-a9ee-1a584e157987%myDoc.pdf
It looks like I can't do the encoding until after I've created the URL else it replaces slashes and things which it shouldn't, but I can't see how I can create the URL with the string and then format it so its suitable for use. I'm not particularly familiar with all this and was hoping someone might be able to point out to me what I'm missing to get string A into a suitably formatted URL to then use with the correct characters replaced?
Any suggestions greatly appreciated!
You need to encode your parameter's values before concatenating them to URL.
Backslash \ is special character which have to be escaped as %5C
Escaping example:
String paramValue = "param\\with\\backslash";
String yourURLStr = "http://host.com?param=" + java.net.URLEncoder.encode(paramValue, "UTF-8");
java.net.URL url = new java.net.URL(yourURLStr);
The result is http://host.com?param=param%5Cwith%5Cbackslash which is properly formatted url string.
I have the same problem, i read the url with an properties file:
String configFile = System.getenv("system.Environment");
if (configFile == null || "".equalsIgnoreCase(configFile.trim())) {
configFile = "dev.properties";
}
// Load properties
Properties properties = new Properties();
properties.load(getClass().getResourceAsStream("/" + configFile));
//read url from file
apiUrl = properties.getProperty("url").trim();
URL url = new URL(apiUrl);
//throw exception here
URLConnection conn = url.openConnection();
dev.properties
url = "https://myDevServer.com/dev/api/gate"
it should be
dev.properties
url = https://myDevServer.com/dev/api/gate
without "" and my problem is solved.
According to oracle documentation
Thrown to indicate that a malformed URL has occurred. Either no legal protocol could be found in a specification string or the string
could not be parsed.
So it means it is not parsed inside the string.
You want to use URI templates. Look carefully at the README of this project: URLEncoder.encode() does NOT work for URIs.
Let us take your original URL:
http://site-test.test.com/Meetings/IC/DownloadDocument?meetingId=c21c905c-8359-4bd6-b864-844709e05754&itemId=a4b724d1-282e-4b36-9d16-d619a807ba67&file=\s604132shvw140\Test-Documents\c21c905c-8359-4bd6-b864-844709e05754_attachments\7e89c3cb-ce53-4a04-a9ee-1a584e157987\myDoc.pdf
and convert it to a URI template with two variables (on multiple lines for clarity):
http://site-test.test.com/Meetings/IC/DownloadDocument
?meetingId={meetingID}&itemId={itemID}&file={file}
Now let us build a variable map with these three variables using the library mentioned in the link:
final VariableMap = VariableMap.newBuilder()
.addScalarValue("meetingID", "c21c905c-8359-4bd6-b864-844709e05754")
.addScalarValue("itemID", "a4b724d1-282e-4b36-9d16-d619a807ba67e")
.addScalarValue("file", "\\\\s604132shvw140\\Test-Documents"
+ "\\c21c905c-8359-4bd6-b864-844709e05754_attachments"
+ "\\7e89c3cb-ce53-4a04-a9ee-1a584e157987\\myDoc.pdf")
.build();
final URITemplate template
= new URITemplate("http://site-test.test.com/Meetings/IC/DownloadDocument"
+ "meetingId={meetingID}&itemId={itemID}&file={file}");
// Generate URL as a String
final String theURL = template.expand(vars);
This is GUARANTEED to return a fully functional URL!
Thanks to Erhun's answer I finally realised that my JSON mapper was returning the quotation marks around my data too! I needed to use "asText()" instead of "toString()"
It's not an uncommon issue - one's brain doesn't see anything wrong with the correct data, surrounded by quotes!
discoveryJson.path("some_endpoint").toString();
"https://what.the.com/heck"
discoveryJson.path("some_endpoint").asText();
https://what.the.com/heck
This code worked for me
public static void main(String[] args) {
try {
java.net.URL url = new java.net.URL("http://path");
System.out.println("Instantiated new URL: " + url);
}
catch (MalformedURLException e) {
e.printStackTrace();
}
}
Instantiated new URL: http://path
Very simple fix
String encodedURL = UriUtils.encodePath(request.getUrl(), "UTF-8");
Works no extra functionality needed.

how to fetch base url from the given url using java

I am trying to fetch base URL using java. I have used jtidy parser in my code to fetch the title. I am getting the title properly using jtidy, but I am not getting the base url from the given URL.
I have some URL as input:
String s1 = "http://staff.unak.is/andy/GameProgramming0910/new_page_2.htm";
String s2 = "http://www.complex.com/pop-culture/2011/04/10-hottest-women-in-fast-and-furious-movies";
From the first string, I want to fetch "http://staff.unak.is/andy/GameProgramming0910/" as a base URL and from the second string, I want "http://www.complex.com/" as a base URL.
I am using code:
URL url = new URL(s1);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
InputStream in = conn.getInputStream();
Document doc = new Tidy().parseDOM(in, null);
String titleText = doc.getElementsByTagName("title").item(0).getFirstChild()
.getNodeValue();
I am getting titletext, but please can let me know how to get base URL from above given URL?
Try to use the java.net.URL class, it will help you:
For the second case, that it is easier, you could use new URL(s2).getHost();
For the first case, you could get the host and also use getFile() method, and remove the string after the last slash ("/"). something like: (code not tested)
URL url = new URL(s1);
String path = url.getFile().substring(0, url.getFile().lastIndexOf('/'));
String base = url.getProtocol() + "://" + url.getHost() + path;
You use the java.net.URL class to resolve relative URLs.
For the first case: removing the filename from the path:
new URL(new URL(s1), ".").toString()
For the second case: setting the root path:
new URL(new URL(s2), "/").toString()

Categories

Resources