input in search box of a website programmatically using JAVA - java

I'm making a Java program where I programmatically insert data into search field of a website and submit it programmatically using java .
After submission a new webpage is opened..
Eg if website name is www.pqr.net/index.php
after I make search submission I'm redirected to that page.
eg. www.pqr.net/ind2.php
i know i can read data using URLCONNECTION.
how to get the url of page where I'm redirected because I want to read the contents of that page , unless I don't know the url of the page where I'm redirected , I can't read the contents
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("www.pqr.net");
HtmlForm form = page1.getFormByName("f1");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("searc");
textField.setValueAttribute("value");
final HtmlPage page2 = button.click();

The URL of the page you are redirected to is in a Location header of the response message. Please refer to the specification for the details, and to the HttpURLConnection javadoc for the method you should use to get a Header from the response.

Related

Crawl dynamically changed web page with HtmlUnit

I want to crawl web page using HtmlUnit. My purpose is:
Load page
Write something to text field
Press download button
Get new page
This is the web site: https://9xbuddy.com/
Using browser I can write an url to text field then press download button and get download link
My code is:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setCssEnabled(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage("https://9xbuddy.com/sites/fembed");
final HtmlForm form = page.getForms().get(0);
final HtmlInput urlInput = form.getInputByName("url");
urlInput.click();
urlInput.type(iframeUrl);
final List<HtmlButton> byXPath = (List<HtmlButton>) form.getByXPath("//button[#class='orange-gradient submit_btn']");
final HtmlPage click = byXPath.get(0).click();
webClient.waitForBackgroundJavaScript(15000);
The problem is:
When I press download button it probably send Ajax reuquest because title changed to save and after few seconds title changed to Process clompleted
. With the code below I want to wait all ajax request but what i finally got is save title that mean HtmlUnit didn't wait for ajax. What is my way to do it?

Can I extract information from linkedIn using java HtmlUnit library?

I tried hard to find a way to extract data from my LinkedIn account without
using the REST API but any result :/ Anyone know if it's possible and how?
When I tried this code in Eclipse the result were either a
NullPointerException or null when I selected some fields from the response
html page.
Note that the selector path works well in the console of the browser.
Thank you very much.
String url = "https://www.linkedin.com/uas/login?goback=&trk=hb_signin";
final WebClient webClient = new WebClient();
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
HtmlPage loginPage = webClient.getPage(url);
final HtmlForm loginForm = loginPage.getFormByName("login");
final HtmlSubmitInput button = loginForm.getInputByName("signin");
final HtmlTextInput usernameTextField =
loginForm.getInputByName("session_key");
final HtmlPasswordInput passwordTextField =
loginForm.getInputByName("session_password");
usernameTextField.setValueAttribute("something#outlook.com");
passwordTextField.setValueAttribute("**************");
final HtmlPage response = button.click();
loginPage=webClient.getPage("https://www.linkedin.com/in/issa-hammoud-
0a2802114/");
System.out.println(loginPage.querySelector("#profile-wrapper > div.pv-
content.profile-view-grid.neptune-grid.two-column.ghost-animate-in >
div.core-rail > section div > div > button > img");
Since you are making a secured connection (HTTPS) you need to specify getOptions().setUseInsecureSSL(true);
Also make sure you enable cookies getCookieManager().setCookiesEnabled(true);
Having said that you should really be using the Linkedin's REST API.
Hope that helps

Open a web browser page after a POST request using Htmlunit library

I'm testing my website and what I do is moving inside of it using Htmlunit library and Java. Like this for example:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
HtmlPage page1 = webClient.getPage(mypage);
// sent using POST
HtmlForm form = page1.getForms().get(0);
HtmlSubmitInput button = form.getInputByName("myButton");
HtmlPage page2 = button.click();
// I want to open page2 on a web browser and continue there using a function like
// continueOnBrowser(page2);
I filled a form programmatically using Htmlunit then I sent the form which uses a POST method. But I'd want to see the content of the response inside a web browser page. The fact is that if I use the URL to see the response it doesn't work since it's the response to a POST method.
It seems like it's the wrong approach to me, it's obvious that if you do anything programmatically you could not expect to open the browser and continue there... I can't figure out what could solve my problem.
Do you have any suggestions?

POSTing a request to the correct URL once HTMLUnit is ignoring the form.setActionAttribute and fom.setAttribute

I'm trying to submit a form using HTMLUnit but it seems that the action attribute of the form is ignored once the http post is going to the same page.
I'm getting the form on this URL:
http://www.tjse.jus.br/tjnet/consultas/internet/consnomeparte.wsp
And in the source code of this URL we can find that the action attribute is set to this URL:
http://www.tjse.jus.br/tjnet/consultas/internet/respconsnomeparte.wsp
But HTMLUnit always post to the first URL.
I'm using fiddler to analyse the request through a real web browser and through HTMLUnit and comparing the two HTTP POST it's easy to see that HTMLUnit is POSTing to the same site, i.e, the first URL mentioned.
I need that HTMLUnit POST to the second URL.
If anyone could help me I'll appreciate.
Problem solved.
Instead of using:
HtmlPage page2 = button.click();
I used:
button.click().getWebResponse().getContentAsString();
I would use something simular to the following.
// Enter your username in feild
searchForm.getInputByName("Username").setValueAttribute(schoolID);
//Submit the form and get the result page
HtmlPage pageResult = (HtmlPage) searchForm.getInputByValue("Search").click();
//Page results in raw html source code
String html = pageResult.asXml();
/*
* filter source code if needed to collect desired data
*/
//login via another server url
page = (HtmlPage) webClient.getPage("https://"+url);
HtmlForm LoginForm = page.getFormByName("Form1");
// login to web portal
LoginForm.getInputByName("txtUserName").setValueAttribute(username);
LoginForm.getInputByName("txtPassword").setValueAttribute(password);
//Submit the form and get the result page
HtmlPage pageResult = (HtmlPage) LoginForm.getInputByName("btnLogin").click();
Note: this htmlUnit code complys with htmlunit 2.15 API

How to programmatically access web page in java

There is a web page from which I want to retrieve a certain string. In order to do so, I need to login, click some buttons, fill a text box, click another button - and then the string appears.
How can I write a java program to do that automatically? Are there any useful libraries for that purpose?
Thanks
Try HtmlUnit
HtmlUnit is a "GUI-Less browser for
Java programs". It models HTML
documents and provides an API that
allows you to invoke pages, fill out
forms, click links, etc... just like
you do in your "normal" browser.
Example code for submiting form:
#Test
public void submittingForm() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("root");
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
For more details check:
http://htmlunit.sourceforge.net/gettingStarted.html
The super simple way to do this is using HtmlUnit here:
http://htmlunit.sourceforge.net/
and what you want to do can be as simple as:
#Test
public void homePage() throws Exception {
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
}
Take a look at the apache HttpClient project, or if you need to run Javascript on the page, try HttpUnit.
Well when you press a button usually you do a request via a HTTP POST method, so you should use HttpClient to handle request and HtmlParser to handle the response page with the string you need.
Yes:
java.net.URL#openConnection() will allow you to make http requests and get the http responses
Apache HttpComponents is a library that makes it easier to work with HTTP.

Categories

Resources