Scraping from website that needs authentication using Java - java

I found this code on an old form and I am trying to get it to work but am getting this error:
File: /net/home/f13/dlschnettler/Desktop/javaScraper/RedditClient.java [line: 46]
Error: cannot access org.w3c.dom.ElementTraversal
class file for org.w3c.dom.ElementTraversal not found
Here's the code:
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class RedditClient {
//Create a new WebClient with any BrowserVersion. WebClient belongs to the
//HtmlUnit library.
private final WebClient WEB_CLIENT = new WebClient(BrowserVersion.CHROME);
//This is pretty self explanatory, these are your Reddit credentials.
private final String username;
private final String password;
//Our constructor. Sets our username and password and does some client config.
RedditClient(String username, String password){
this.username = username;
this.password = password;
//Retreives our WebClient's cookie manager and enables cookies.
//This is what allows us to view pages that require login.
//If this were set to false, the login session wouldn't persist.
WEB_CLIENT.getCookieManager().setCookiesEnabled(true);
}
public void login(){
//This is the URL where we log in, easy.
String loginURL = "https://www.reddit.com/login";
try {
//Okay, bare with me here. This part is simple but it can be tricky
//to understand at first. Reference the login form above and follow
//along.
//Create an HtmlPage and get the login page.
HtmlPage loginPage = WEB_CLIENT.getPage(loginURL);
//Create an HtmlForm by locating the form that pertains to logging in.
//"//form[#id='login-form']" means "Hey, look for a <form> tag with the
//id attribute 'login-form'" Sound familiar?
//<form id="login-form" method="post" ...
HtmlForm loginForm = loginPage.getFirstByXPath("//form[#id='login-form']");
//This is where we modify the form. The getInputByName method looks
//for an <input> tag with some name attribute. For example, user or passwd.
//If we take a look at the form, it all makes sense.
//<input value="" name="user" id="user_login" ...
//After we locate the input tag, we set the value to what belongs.
//So we're saying, "Find the <input> tags with the names "user" and "passwd"
//and throw in our username and password in the text fields.
loginForm.getInputByName("user").setValueAttribute(username);
loginForm.getInputByName("passwd").setValueAttribute(password);
//<button type="submit" class="c-btn c-btn-primary c-pull-right" ...
//Okay, you may have noticed the button has no name. What the line
//below does is locate all of the <button>s in the login form and
//clicks the first and only one. (.get(0)) This is something that
//you can do if you come across inputs without names, ids, etc.
loginForm.getElementsByTagName("button").get(0).click();
} catch (FailingHttpStatusCodeException e) {
e.printStackTrace();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public String get(String URL){
try {
//All this method does is return the HTML response for some URL.
//We'll call this after we log in!
return WEB_CLIENT.getPage(URL).getWebResponse().getContentAsString();
} catch (FailingHttpStatusCodeException e) {
e.printStackTrace();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
//Create a new RedditClient and log us in!
RedditClient client = new RedditClient("hutsboR", "MyPassword!");
client.login();
//Let's scrape our messages, information behind a login.
//https://www.reddit.com/message/messages/ is the URL where messages are located.
String page = client.get("https://www.reddit.com/message/messages/");
//"div.md" selects all divs with the class name "md", that's where message
//bodies are stored. You'll find "<div class="md">" before each message.
Elements messages = Jsoup.parse(page).select("div.md");
//For each message in messages, let's print out message and a new line.
for(Element message : messages){
System.out.println(message.text() + "\n");
}
}
}
Not really sure how to fix it since I'm not very familiar with scraping in the first place.

Try to add xml-apis to your classpath

Related

Trying to access hidden tag (aria-hidden) using htmlUnit

Trying to access reservationButton_time from url https://resy.com/cities/ny/holywater?date=2023-02-12&seats=2 using htmlUnit. But when I use the webClient to getPage() and sout the page.asXml() I see the tag is hidden because of aria-hidden="{{!!isBackdropOpen}}. So I am not able to see the same tags as shown in the console.
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.*;
import java.io.IOException;
public class Scrape {
public static void run() {
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
try {
HtmlPage page = webClient.getPage("https://resy.com/cities/ny/holywater");
System.out.println(page.asXml());
webClient.getCurrentWindow().getJobManager().removeAllJobs();
webClient.close();
} catch (IOException e) {
System.out.println("An error occurred: " + e);
}
}
}
I tried printing out the tags but get loading.output -> System.out.println(oage.asXml())
The output I am expecting is the reservationButton_time tag.

Java Cucumber Using Keys From Property File In Examples Table

Its been a long time since I have worked with Scenario Outlines. What I wanted to achieve was to reference keys from my config_data.properties file in the Examples table, so that I have once source of truth for content/data for my tests.
I have been getting an error on the steps that need to get the data from the properties file and the value that gets entered into the first name text box when I run my test is firstName1 not the value in the properties file. The error in question:
[31morg.openqa.selenium.WebDriverException: unknown error: keys should be a
string
Here is what I have:
Feature File:
#new_test
Scenario Outline: User fills out the Personal Info Form With Valid Data
And I enter a first name as "<first_name>"
And I enter a middle name as "<middle_name>"
And I enter a last name as "<last_name>"
Examples:
|first_name | middle_name | last_name |
|firstName1 | middlename1 | lastname1 |
Step Definitions: (I think this is where the problem is)
public class PersonalInfoFormSteps {
private PersonalInfoFormPage personalInfo;
private DataReader data;
#When("^I enter a first name as \"([^\"]*)\"$")
public void i_enter_a_first_name_as(String first_name) throws Throwable {
personalInfo.getFirstNameField().click();
data.loadData().getProperty(first_name);
personalInfo.getFirstNameField().sendKeys(first_name);
}
}
DataReader.class (This works fine for getting data that does not involve scenario outline)
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.support.ui.WebDriverWait;
import net.serenitybdd.core.pages.PageObject;
public class DataReader extends PageObject {
WebDriverWait wait = null;
private WebDriver driver;
String result = "";
InputStream inputStream;
File file = new File(
"file path to properties file goes here");
public DataReader(WebDriver driver) {
super();
}
public Properties loadData() throws IOException {
Properties prop = new Properties();
FileInputStream fileInput = null;
try {
fileInput = new FileInputStream(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
// load properties file
try {
prop.load(fileInput);
} catch (IOException e) {
e.printStackTrace();
}
return prop;
}
}
Properties File Data:
firstName1: Mickey
middleName1: M
lastName1: Mouse
Haha figured it out. I changed the code in the step definition:
#And("^I enter a first name as \"([^\"]*)\"$")
public void i_enter_a_first_name_as(String first_name) throws Throwable {
personalInfo.getFirstNameField().click();
personalInfo.getFirstNameField()
.sendKeys(data.loadData().getProperty(first_name));
}
Your Step Definition is wrong,
replace this line :
#When("^I enter a first name as \"([^\"]*)\"$")
by this line
#And("^I enter a first name as \"([^\"]*)\"$")
Or replace this line :
And I enter a first name as "<first_name>"
By this line
When I enter a first name as "<first_name>"

Cannot submit a website form through Selenium

This is the second post on Stack Overflow on my quest to access this godforsaken website: https://portal.mcpsmd.org/guardian/home.html
import org.openqa.selenium.By;
import org.openqa.selenium.Keys;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;
public class WebAccessor {
public static void main(String[] args) {
WebDriver driver = new HtmlUnitDriver();
driver.get("https://portal.mcpsmd.org/public/");
System.out.println(driver.getCurrentUrl());
// Find the text input element by its name
WebElement username = driver.findElement(By.id("fieldAccount"));
WebElement password = driver.findElement(By.id("fieldPassword"));
// Enter something to search for
username.sendKeys("");
password.sendKeys("");
WebElement submitBtn = driver.findElement(By.id("btn-enter"));
submitBtn.click();
System.out.println(driver.getCurrentUrl());
driver.quit();
}
}
This code is tested and works on Facebook
I am sure that my button is being pressed as when I click submit, the site URL changes from
https://portal.mcpsmd.org/public/
to
https://portal.mcpsmd.org/guardian/home.html
When I type in usernames and passwords, (actual user and pass cannot be disclosed for obvious reasons), the password line actually tacks on another 20 or so characters to the end of the password field. (You can see this by typing in any random username and password and clicking submit).
This has lead me to believe there is some sort of front-end encryption going on. Is there any feasible way to log in?
Many thanks in advance.
due to lack of credentials, my answer is just a bet.
But i think you should redirect after login, with a little tweak to avoid exceptions, like this:
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class WebAccessor {
public static void main(String[] args) {
WebClient WEB_CLIENT = new WebClient(BrowserVersion.CHROME);
WEB_CLIENT.getCookieManager().setCookiesEnabled(true);
HtmlPage loginPage;
try {
loginPage = WEB_CLIENT.getPage("https://portal.mcpsmd.org/public/");
HtmlForm loginForm = loginPage.getFirstByXPath("//form[#id='LoginForm']");
loginForm.getInputByName("account").setValueAttribute("YOURPASSWORD");
loginForm.getInputByName("pw").setValueAttribute("YOURPASSWORD");
loginForm.getElementsByTagName("button").get(0).click();
HtmlPage landing = WEB_CLIENT.getPage("https://portal.mcpsmd.org/guardian/home.html#/termGrades");
System.out.println(landing.getTitleText());
} catch (FailingHttpStatusCodeException e) {
// TODO Auto-generated catch block
//e.printStackTrace();
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
//e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
//e.printStackTrace();
}
}
}
My output is: Student and Parent Sign In. But if you set correct attributes, it should be ok.

How to recal unique text using java div class

I'd like to make one text for multiple html files, something like greating. Let's say greating is:
"Hello, if you have any questions please conatact me."
What I want is to recall that text on every html page. And later if I change it, the change would appear on all the html pages.
I am weak on java, but I think I need to create some javascript and recall the text with div class function, like the facebook button is made.
P.S. Facebook button recall:
<div class="fb-like" data-href="https://developers.facebook.com/docs/plugins/" data- layout="standard" data-action="like" data-show-faces="true" data-share="true">
with javascript you can change the content of a tag with html() function, or you could include the resource, i guess it depends on the technology being used
In the simplest form, you can create a function in your javascript master copy and make a document.write call. You would need to call that script file on every page.
function greetingMessage() {
document.write('your message);
};
Then
call greetingMessage();
you can also put the javascript in a master file and then have the div in each HTML page:
function greetingMessage(){
document.getElementById('Message').innerHTML = 'Your Message';
};
HTML:
<body onload="greetingMessage();">
<div id="Message" style="color:red;"></div>
If you are using JSP's or Servlets you can have a resource/properties file that contains many Strings being used throughout your application. The properties file would contain key=value pairs. You could then simply reference a particular key in the properties file, for instance:
greeting=Hello, if you have any questions please contact me
The key is "greeting", the value is "Hello, if you have any questions please contact me"
To read in the properties file you would use the Properties class like so:
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;
public class MyWebPage extends HttpServlet
{
public void doGet(HttpServletRequest req, HttpServletResponse res)
{
PrintWriter out = response.getWriter();
response.setContentType("text/html");
out.print("<html><head></head><body><div class=\"someclass\">" +
getGreeting() + "</div>"
"</body></html>"
);
}
public String getGreeting()
{
String greeting = "";
try{
Properties prop = new Properties();
InputStream input = new FileInputStream("config.properties");
// load a properties file
prop.load(input);
greeting = prop.getProperty("greeting");
input.close();
}
catch(IOException ioe){ioe.printStackTrace();}
finally{
if (input != null)
{
try {
input.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return greeting;
}
}
The same sort of thing can be effectively used in Java Server Pages. Hope this helps.

Javascript to Java Applet communication

I am trying to pass a selected value from HTML drop-down to an Applet method, using setter method in the Applet. But every time the Javascript is invoked it shows "object doesn't support this property or method" as an exception.
My javascript code :
function showSelected(value){
alert("the value given from"+value);
var diseasename=value;
alert(diseasename);
document.decisiontreeapplet.setDieasename(diseasename);
alert("i am after value set ");
}
My applet code :
package com.vaannila.utility;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import prefuse.util.ui.JPrefuseApplet;
public class dynamicTreeApplet extends JPrefuseApplet {
private static final long serialVersionUID = 1L;
public static int i = 1;
public String dieasenameencode;
//System.out.println("asjdjkhcd"+dieasenameencode);
public void init() {
System.out.println("asjdjkhcd"+dieasenameencode);
System.out.println("the value of i is " + i);
URL url = null;
//String ashu=this.getParameter("dieasenmae");
//System.out.println("the value of the dieases is "+ashu);
//Here dieasesname is important to make the page refresh happen
//String dencode = dieasenameencode.trim();
try {
//String dieasename = URLEncoder.encode(dencode, "UTF-8");
// i want this piece of the code to be called
url = new URL("http://localhost:8080/docRuleToolProtocol/appletRefreshAction.do?dieasename="+dieasenameencode);
URLConnection con = url.openConnection();
con.setDoOutput(true);
con.setDoInput(true);
con.setUseCaches(false);
InputStream ois = con.getInputStream();
this.setContentPane(dynamicView.demo(ois, "name"));
ois.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (FileNotFoundException f) {
f.printStackTrace();
} catch (IOException io) {
io.printStackTrace();
}
++i;
}
public void setDieasename(String message){
System.out.println("atleast i am here and call is made ");
this.dieasenameencode=message;
System.out.println("the final value of the dieasenmae"+dieasenameencode);
}
}
My appletdeployment code :
<applet id="decisiontreeapplet" code="com.vaannila.utility.dynamicTreeApplet.class" archive="./appletjars/dynamictree.jar, ./appletjars/prefuse.jar" width ="1000" height="500" >
</applet>
Change..
document.decisiontreeapplet
..to..
document.getElementById('decisiontreeapplet')
..and it will most likely work.
E.G.
HTML
<html>
<body>
<script type='text/javascript'>
function callApplet() {
msg = document.getElementById('input').value;
applet = document.getElementById('output');
applet.setMessage(msg);
}
</script>
<input id='input' type='text' size=20 onchange='callApplet()'>
<br>
<applet
id='output'
code='CallApplet'
width=120
height=20>
</applet>
</body>
</html>
Java
import javax.swing.*;
public class CallApplet extends JApplet {
JTextField output;
public void init() {
output = new JTextField(20);
add(output);
validate();
}
public void setMessage(String message) {
output.setText(message);
}
}
Please also consider posting a short complete example next time. Note that the number of lines in the two sources shown above, is shorter that your e.g. applet, and it took me longer to prepare the source so I could check my answer.
Try changing the id parameter in your applet tag to name instead.
<applet name="decisiontreeapplet" ...>
</applet>
Try passing parameters using the param tag:
http://download.oracle.com/javase/tutorial/deployment/applet/param.html
I think the <applet> tag is obsolete and <object> tag shoudl be used instead. I recall there was some boolean param named scriptable in the object tag.
Why you do not use deployment toolkit ? It would save you a lot of trying - see http://rostislav-matl.blogspot.com/2011/10/java-applets-building-with-maven.html for more info.

Categories

Resources