How to program a script that changes url - java

I want to make a tampermonkey script that basically changes the url of the page. What I want to do is to look if the url has "youtube.com" in it and if it doesn't then it should add /youtube.com to the url.
An example of this is:
The starting website: www.website.com/watch8dzjad8
The changed website: www.website.com/youtube.com/watch8dzjad8
If it helps then the script is meant to be finished in tampermonkey, so that on a specific website it is going to scan for the link and add the /youtube.com if it can't find it since it won't work otherwise and it would really help me to not to copy and paste /youtube.com 10 times a day, as well as to learn how to work with URL's in JavaScript. Thanks in advance

if( !location.host.match(/youtube.com/) )
location= "/youtube.com"+ location.pathname
But instead of that you should restrict this behaviour to a specific site, not just all domains that are not youtube, for example:
if( location.href.match(/website.com\/watch/) )
location= "/youtube.com"+ location.pathname
Explanations
location.href.match(/website.com/watch/)
location.host is the domain of the page (www.website.com)
location.href is the complete URL of the page (http://www.website.com/watch8dzjad8)
match tests if the string follow the given pattern
location= "/youtube.com"+ location.pathname
setting location implies opening the given URL
location.pathname gives the path of the URL (/watch8dzjad8)
So if the URL (http://www.website.com/watch8dzjad8) of the visited page contains the string "website.com/watch", then open "/youtube.com" + "/watch8dzjad8".
As the domain is the same, a relative URL is enough, the browser knows that is the same domain as the current page.
https://developer.mozilla.org/en-US/docs/Web/API/Window/location
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match

try this
function getQueryValue( myUrl ){
myUrl = newUrl.replace(/[\[]/,"\\\[").replace(/[\]]/,"\\\]");
var regexS = "[\\?&]" + myUrl + "=([^&#]*)";
var regex = new RegExp( regexS );
var results = regex.exec( location.href);
if( results == null )
return "";
else
return results;
}
//current url
var curUrl = location.href;
//new url
var newUrl = getQueryValue( "curUrl" );
//redirect to new page
location.href = newUrl;
}

Related

How to open new URL in the same tab with Selenium, but with some way with editing the URL?

I want to open a new URL in the same tab in Selenium, but I need to somehow edit the URL. What working options are there? I am thinking about navigating by keys, but the problem is that the URL can't be inspected.
To open and edit the url on the same tab:
1) First you need to get the current page url, to do so:
getCurrentUrl command :
Usage : Used to get the URL of the currently opened web page in the
browser.
Syntax : driver.getCurrentUrl();
Return type : String (Returns the URL of current web page)
Example :
String currentURL = driver.getCurrentUrl();
System.out.println(currentURL);
2) Now editing url:
With the above url, now you can change whatever to want to:
example:
String currentURL = driver.getCurrentUrl();
//Let assume CurrentUrl = {something}/product/id1/
String editURL = currentURL.replace("id1", "id2")
OR
String editURL = currentURL + "/something/";
3) Opening edited url on the same tab.
driver.get(editURL);
OR
driver.navigate().to(editURL);
There is a Selenium method called getCurrentUrl. Use it like so,
String currentURL = driver.getCurrentUrl();
String newURL = currentURL + "yourEdit";

Trying to find specific links while web crawling

I am modifying the code given in [crawler4j][1]. I want to find specific links while crawling a web site. For ex I am crawling on www.cmu.edu and I am trying to get the link for directory search. Here is my code for it -
public void visit(Page page) {
String url = page.getWebURL().getURL();
// System.out.println("URL: " + url);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String text = htmlParseData.getText();
String html = htmlParseData.getHtml();
System.out.println(html.matches(".*<a href.*."));
if (html.matches(".*.<a href=.*.>Directory Search</a>.*."))
System.out.println("***********Hello*********************");
// System.out.println("----------"+html);
return;
// List<WebURL> links = htmlParseData.getOutgoingUrls();
}
}
This code does not work. I am not getting the *******Helo********* on my console. Just to check I printed the html string in console and I copied the anchor tag that contains the directory sreach and I wrote this simple two line code -
String test2="<li class=\"first\">Directory Search</li>";
System.out.println("*******"+test2.matches(".*.<a href=.*.>Directory Search</a>.*."));
This works. The value of String test2 is copied from the console. What am I doing wrong in the first part of the code?
[1]
Try this (you have to use (?s) to match also new line characters)
String test2="qwert\n\n<li class=\"first\">Directory Search</li>";
System.out.println("*******"+test2.matches("(?s).*.<a href=.*.>Directory Search</a>.*."));

How to read the public URL in GWT?

I m new in GWT and I m generating a web application in which i have to create a public URL.
In this public URL i have to pass hashtag(#) and some parameters.
I am finding difficulty in achieving this task.
Extracting the hashtag from the URL.
Extracting the userid from the URL.
My public URL example is :: http://www.xyz.com/#profile?userid=10003
To access the URL in GWT you can use the History.getToken() method. It will give you the entire string that follows the hashtag ("#").
In your case (http://www.xyz.com/#profile?userid=10003) it will return a string "profile?userid=10003". After you have this you can parse it however you want. You can check if it contains("?") and u can split it by "?" or you can get a substring. How you get the information from that is really up to you.
I guess you already have the URL. I'm not that good at Regex, but this should work:
String yourURL = "http://www.xyz.com/#profile?userid=10003";
String[] array = yourURL.split("[\\p{Lower}\\p{Upper}\\p{Punct}}]");
int userID = 0;
for (String string : array) {
if (!string.isEmpty()) {
userID = Integer.valueOf(string);
}
}
System.out.println(userID);
To get the parameters:
String userId = Window.Location.getParameter("userid");
To get the anchor / hash tag:
I don't think there is something, you can parse the URL: look at the methods provided by Window.Location.

how to use Jsoup in site that has lazyload scrollLoader.js

I have a problem about jsoup because of lazyload scrollLoader.js
I reach site with java code, i have listed only 50 image name by jsoup.But when scroll down on site ,lots of image loads continously. My question is that, is it possible to post image amount into url that uses with Jsoup.connect() to get all image from the site?
here is site : http://www.logowik.com
And this is the usege of script in the site :
<script type="text/javascript">
$(document).ready(function(e) {
CalculateColumns();
recordCount = 50;
groupID = "0";
catID = "0";
query = "";
userEntry = "";
groupInterval = "0";
AddEvent(window, "resize", CalculateColumns);
document["scrollLoader"] = new scrollLoader({evn : getGrids, seize : 1});
document["scrollLoader"].DoScroll();
addLogoClickEvent();
});
</script>
I post this parameters with url like :
http://www.logowik.com/index.php?g=1&groupID=1&catID=0
with this url I get 50 image,because of recordCount = 50 in script. but i cannot post this parameter to url.
For getting 100 images, I try this url: http://www.logowik.com/index.php?recordCount=100&g=1&groupID=1&catID=0
but it doesn't effect.
Thanks
Use firebug or chrome dev tools network panel to see all the requests generated when loading the images, then just recreate them in jsoup.

url harvester string manipulation

I'm doing a recursive url harvest.. when I find an link in the source that doesn't start with "http" then I append it to the current url. Problem is when I run into a dynamic site the link without an http is usually a new parameter for the current url. For example if the current url is something like http://www.somewebapp.com/default.aspx?pageid=4088 and in the source for that page there is a link which is default.aspx?pageid=2111. In this case I need do some string manipulation; this is where I need help.
pseudocode:
if part of the link found is a contains a substring of the current url
save the substring
save the unique part of the link found
replace whatever is after the substring in the current url with the unique saved part
What would this look like in java? Any ideas for doing this differently? Thanks.
As per comment, here's what I've tried:
if (!matched.startsWith("http")) {
String[] splitted = url.toString().split("/");
java.lang.String endOfURL = splitted[splitted.length-1];
boolean b = false;
while (!b && endOfURL.length() > 5) { // f.bar shortest val
endOfURL = endOfURL.substring(0, endOfURL.length()-2);
if (matched.contains(endOfURL)) {
matched = matched.substring(endOfURL.length()-1);
matched = url.toString().substring(url.toString().length() - matched.length()) + matched;
b = true;
}
}
it's not working well..
I think you are doing this the wrong way. Java has two classes URL and URI which are capable of parsing URL/URL strings much more accurately than a "string bashing" solution. For example the URL constructor URL(URL, String) will create a new URL object in the context of an existing one, without you needing to worry whether the String is an absolute URL or a relative one. You would use it something like this:
URL currentPageUrl = ...
String linkUrlString = ...
// (Exception handling not included ...)
URL linkUrl = new URL(currentPageUrl, linkUrlString);

Categories

Resources