Parse html content for a value

Parse html content for a value - java

I receive a Http response after a call as Html String and I would like to scrape certain value stored inside the ReportViewer1 variable.
<html>
....................
...........
<script type="text/javascript">
var ReportViewer1 = new ReportViewer('ReportViewer1', 'ReportViewer1_ReportToolbar', 'ReportViewer1_ReportArea_WaitControl', 'ReportViewer1_ReportArea_ReportCell', 'ReportViewer1_ReportArea_PreviewFrame', 'ReportViewer1_ParametersAreaCell', 'ReportViewer1_ReportArea_ErrorControl', 'ReportViewer1_ReportArea_ErrorLabel', 'ReportViewer1_CP', '/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100', '', 'ReportViewer1_EditorPlaceholder', 'ReportViewer1_CalendarFrame', 'ReportViewer1_ReportArea_DocumentMapCell', {
CurrentPageToolTip: 'STR_TELERIK_MSG_CUR_PAGE_TOOL_TIP',
ExportButtonText: 'Export',
ExportToolTip: 'Export',
ExportSelectFormatText: 'Export to the selected format',
FirstPageToolTip: 'First page',
LabelOf: 'of',
LastPageToolTip: 'Last Page',
ProcessingReportMessage: 'Generating report...',
NoPageToDisplay: 'No page to display.',
NextPageToolTip: 'Next page',
ParametersToolTip: 'Click to close parameters area|Click to open parameters area',
DocumentMapToolTip: 'Hide document map|Show document map',
PreviousPageToolTip: 'Previous page',
TogglePageLayoutToolTip: 'Switch to interactive view|Switch to print preview',
SessionHasExpiredError: 'Session has expired.',
SessionHasExpiredMessage: 'Please, refresh the page.',
PrintToolTip: 'Print',
RefreshToolTip: 'Refresh',
NavigateBackToolTip: 'Navigate back',
NavigateForwardToolTip: 'Navigate forward',
ReportParametersSelectAllText: '<select all>',
ReportParametersSelectAValueText: '<select a value>',
ReportParametersInvalidValueText: 'Invalid value.',
ReportParametersNoValueText: 'Value required.',
ReportParametersNullText: 'NULL',
ReportParametersPreviewButtonText: 'Preview',
ReportParametersFalseValueLabel: 'False',
ReportParametersInputDataError: 'Missing or invalid parameter value. Please input valid data for all parameters.',
ReportParametersTrueValueLabel: 'True',
MissingReportSource: 'The source of the report definition has not been specified.',
ZoomToPageWidth: 'Page Width',
ZoomToWholePage: 'Full Page'
}, 'ReportViewer1_ReportArea_ReportArea', 'ReportViewer1_ReportArea_SplitterCell', 'ReportViewer1_ReportArea_DocumentMapCell', true, true, 'PDF', 'ReportViewer1_RSID', true);
</script>
...................
...................
</html>
The value is a90a0d41efa6429eadfefa42fc529de1 and this is in the middle of this content:
'/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100',
Whats the best way I can parse this value using Java?

Parse the HTML with String class
public class HtmlParser {
public static void main(String args[]){
String result = getValuesProp(html);
System.out.println("Result: "+ result);
}
static String PIVOT = "Telerik.ReportViewer.axd";
public static String getValuesProp(String json) {
String subString;
int i = json.indexOf(PIVOT);
i+= PIVOT.length();
//', chars
i+=2;
subString = json.substring(i);
i = subString.indexOf("'");
i++;
subString = subString.substring(i);
i = subString.indexOf("'");
subString = subString.substring(0,i);
return subString;
}
static String html ="<html>\n" +
"\n" +
"<script type=\"text/javascript\">\n" +
" var ReportViewer1 = new ReportViewer('ReportViewer1', 'ReportViewer1_ReportToolbar', 'ReportViewer1_ReportArea_WaitControl', 'ReportViewer1_ReportArea_ReportCell', 'ReportViewer1_ReportArea_PreviewFrame', 'ReportViewer1_ParametersAreaCell', 'ReportViewer1_ReportArea_ErrorControl', 'ReportViewer1_ReportArea_ErrorLabel', 'ReportViewer1_CP', '/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100', '', 'ReportViewer1_EditorPlaceholder', 'ReportViewer1_CalendarFrame', 'ReportViewer1_ReportArea_DocumentMapCell', {\n" +
" CurrentPageToolTip: 'STR_TELERIK_MSG_CUR_PAGE_TOOL_TIP',\n" +
" ExportButtonText: 'Export',\n" +
" ExportToolTip: 'Export',\n" +
" ExportSelectFormatText: 'Export to the selected format',\n" +
" FirstPageToolTip: 'First page',\n" +
" LabelOf: 'of',\n" +
" LastPageToolTip: 'Last Page',\n" +
" ProcessingReportMessage: 'Generating report...',\n" +
" NoPageToDisplay: 'No page to display.',\n" +
" NextPageToolTip: 'Next page',\n" +
" ParametersToolTip: 'Click to close parameters area|Click to open parameters area',\n" +
" DocumentMapToolTip: 'Hide document map|Show document map',\n" +
" PreviousPageToolTip: 'Previous page',\n" +
" TogglePageLayoutToolTip: 'Switch to interactive view|Switch to print preview',\n" +
" SessionHasExpiredError: 'Session has expired.',\n" +
" SessionHasExpiredMessage: 'Please, refresh the page.',\n" +
" PrintToolTip: 'Print',\n" +
" RefreshToolTip: 'Refresh',\n" +
" NavigateBackToolTip: 'Navigate back',\n" +
" NavigateForwardToolTip: 'Navigate forward',\n" +
" ReportParametersSelectAllText: '<select all>',\n" +
" ReportParametersSelectAValueText: '<select a value>',\n" +
" ReportParametersInvalidValueText: 'Invalid value.',\n" +
" ReportParametersNoValueText: 'Value required.',\n" +
" ReportParametersNullText: 'NULL',\n" +
" ReportParametersPreviewButtonText: 'Preview',\n" +
" ReportParametersFalseValueLabel: 'False',\n" +
" ReportParametersInputDataError: 'Missing or invalid parameter value. Please input valid data for all parameters.',\n" +
" ReportParametersTrueValueLabel: 'True',\n" +
" MissingReportSource: 'The source of the report definition has not been specified.',\n" +
" ZoomToPageWidth: 'Page Width',\n" +
" ZoomToWholePage: 'Full Page'\n" +
" }, 'ReportViewer1_ReportArea_ReportArea', 'ReportViewer1_ReportArea_SplitterCell', 'ReportViewer1_ReportArea_DocumentMapCell', true, true, 'PDF', 'ReportViewer1_RSID', true);\n" +
" </script>\n" +
"\n" +
"</html>";
}

I would read the text a line at a time like how most files are read. Because the format will always be the same, you look for a line that begins with the characters "var ReportViewer1." Then you know you have found the line you want. You may need to strip some white space, although it will always be formatted with the same whitespace too (up to you really.)
When you have the line, use the String .split() method to split that line into an array. There are nice delimiters there to split on ... "," or " " or ", " ... again, see what works best for you.
Test the split up line parts for '/app/Telerik.ReportViewer.axd' ... the next member of your split array will be the value you are looking for.
Again, the formatting will always be the same, so you can rely on that to find your variable. Of course, study the html text to make sure it does always follow the same format within the line you are investigating, but looking at it, I assume it probably does.
Again, find your line ... split it on a delimiter ... and use some logic to find the element you are after in the split up line parts.

Related

Parsing a specific text value with JSoup

Hey does anyone know how to parse the "Light rain", " 7°C", and "Limited"? These are stored as #text so that's kind of throwing me off. For reference, to parse "Temperature:", it would be Element element5 = doc.select("strong").get(3);
Thanks!

The nodes from your example are called text nodes. In Jsoup, you can read the text nodes of a node by using the text() method. So given your example using Jsoup we'd select the td element and then use text() to get it's text value.
However, this would also output the text value from any child nodes, so in your case this would produce Weather: Light rain as a single string. Fortunately, Jsoup also has a ownText() method that only extracts the value from the text nodes that are a direct descendant of the element (and not all children). So given your example code, you could write it like this:
Element element5 = doc.select("td").get(3);
String value = element5.ownText()

You can use variuos ways to extract required text and one of them is td.childNode(1).toString() and complete solution is mentioned below:
public static void main(String[] args) {
// Parse HTML String using JSoup library
String HTMLSTring = "<html>\n" +
" <head></head>\n" +
" <body>\n" +
" <table class=\"table\"> \n" +
" <tbody>\n" +
" <tr> \n" +
" <td><strong>Weather: </strong>Light Rain</td> \n" +
" </tr> \n" +
" <tr> \n" +
" <td><strong>Tempratue: </strong>70 C</td> \n" +
" </tr> \n" +
" <tr> \n" +
" <td><strong>Visibility: </strong>Limited</td> \n" +
" </tr> \n" +
" <tr> \n" +
" <td><strong>Runs open: </strong>0</td> \n" +
" </tr>\n" +
" </tbody>\n" +
" </table>\n" +
" </body>\n" +
"</html>"
+ "<head></head>";
Document html = Jsoup.parse(HTMLSTring);
Elements tds = html.getElementsByTag("td");
for (Element td : tds) {
//String tdStrongText = td.childNode(0).childNodes().get(0).toString();
String tdStrongText = td.select("strong").text();
System.out.print(tdStrongText + " : ");
String tdText = td.childNode(1).toString();
System.out.println(tdText);
}
}
Check out code on github.

Checking if a button is disabled or not

my issue i am having is that selenium is saying that the next arrow button is enabled when it is disabled/grayed out. what i am trying to do is this
1 click next arrow button
2 sleep for 5 seconds
3 check if disabled
4 click next arrow button
5 check if disabled
( loop repeat steps 1 -5)
if button is disabled break do while loop
my code that is not working is below
PS_OBJ_CycleData.Nextbtn(driver).click();
Thread.sleep(5000);
WebElement element = driver.findElement(By.id("changeStartWeekGrid_next"));
if (element.isEnabled()) {
System.out.println("Good next arrow enabled");
} else {
System.out.println("next arrow disabled");
PS_OBJ_CycleData.Cancelbtn(driver).click();
break dowhileloop;
}
my console output is "Good next arrow enabled" instead of going to the else statment.
Button HTML is here
<div id="changeStartWeekGrid_next" class="paginationButton" disabled="disabled" data-xpal="xpath-selected">
<a tabindex="0" href="#" id="changeStartWeekGrid_next_link" onclick="var registry = require('dijit/registry'); registry.byId('changeStartWeekGrid').next(); return false;">
<span class="icon-pagination-next"></span>
</a>
</div>
As you can see the button is actually disabled there another way to check is button is really disabled? Any help would be appreciated.
this is an additional picture of the inspected element

The documentation for isEnabled.
Sadly, using the isEnabled method doesn't work in this case, as stated by the documentation:
This will generally return true for everything but disabled input elements.
A proper alternative is using JavaScript to check for the attribute's existence, and its value. You can inject JavaScript through the executeScript method of the webdriver classes. The first argument is the script, all following arguments are passed to the script, accessible as arguments[i], ...
For example:
Boolean disabled = driver.executeScript("return arguments[0].hasAttribute(\"disabled\");", element);

I In this case since i did not have an actual button I needed to find it attribute to see if it was disabled or not.
PS_OBJ_CycleData.Nextbtn(driver).click();
Thread.sleep(4000);
// check is arrow button is disabled
if (driver.findElement(By.id("changeStartWeekGrid_next")).getAttribute("disabled") != null) {
PS_OBJ_CycleData.Cancelbtn(driver).click();
break dowhileloop;
}

You can check it with this simple code:
Boolean isbutton;
isbutton=button1.isEnable()

Make sure you have the correct element. I've wasted hours trying to figure out why an element was enabled when it shouldn't have been, when I was actually looking at the wrong one! Inspecting the element in the browser did not help, because it wasn't the same element that the java code was looking at. The following code turned out to be helpful:
System.out.println("Actual element=" + describeElement(yourElement));
public static String describeElement(WebElement element) {
String result = "";
if (element == null ) {
log.error("Could not describe null Element");
return "null";
}
// Look for common attributes, such as id, name, value, title, placeholder, type, href, target, role, class,
String id = element.getAttribute("id");
String name = element.getAttribute("name");
String value = element.getAttribute("value");
String title = element.getAttribute("title");
String placeholder = element.getAttribute("placeholder");
String type = element.getAttribute("type");
String href = element.getAttribute("href");
String target = element.getAttribute("target");
String role = element.getAttribute("role");
String thisClass = element.getAttribute("class");
result = "WebElement [tag:" + element.getTagName() + " text:'" + limit(element.getText()) + "' id:'" + id + "' " +
(StringUtils.isEmpty(name) ? "" : (" name:'" + name + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" value:'" + value + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" title:'" + title + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" placeholder:'" + placeholder + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" type:'" + type + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" href:'" + href + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" target:'" + target + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" name:'" + name + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" role:'" + role + "' ")) +
(StringUtils.isEmpty(name) ? "" : (" class:'" + thisClass + "' ")) +
" isDisplayed: " + element.isDisplayed() +
" isEnabled: " + element.isEnabled() +
" isSelected: " + element.isSelected() + "]";
return result;
}

Unable to Select option from dropdown using JavascrtptExecutor

Can anyone provide me a failsafe(ish) method for selecting text from dropdowns on this page I am practicing on?
https://www.club18-30.com/club18-30
Specifically, the 'from' and 'to' airport dropdowns. I am using the following code:
public void selectWhereFrom(String query, String whereFromSelect) throws InterruptedException {
WebElement dropDownContainer = driver.findElement(By.xpath(departureAirportLocator));
dropDownContainer.click();
selectOption(query,whereFromSelect);
}
public void selectOption(String query, String option) {
String script =
"function selectOption(s) {\r\n" +
" var sel = document.querySelector(' " + query + "');\r\n" +
" for (var i = 0; i < sel.options.length; i++)\r\n" +
" {\r\n" +
" if (sel.options[i].text.indexOf(s) > -1)\r\n" +
" {\r\n" +
" sel.options[i].selected = true;\r\n" +
" break;\r\n" +
" }\r\n" +
" }\r\n" +
"}\r\n" +
"return selectOption('" + option + "');";
javaScriptExecutor(script);
}
This seems to successfully populate the box with text but when I hit 'Search' I then receive a message saying I need to select an option, suggesting it has not registered the selection?
I would rather avoid JavaScriptExecutor but haven't been able to make these Selects work with a regular Selenium Select mechanism

I would set up a function for each dropdown, one for setting the departure airport and another for setting the destination airport. I've tested the code below and it works.
The functions
public static void setDepartureAirport(String airport)
{
driver.findElement(By.cssSelector("div.departureAirport div.departurePoint")).click();
String xpath = "//div[contains(#class, 'departurePoint')]//ul//li[contains(#class, 'custom-select-option') and contains(text(), '"
+ airport + "')]";
driver.findElement(By.xpath(xpath)).click();
}
public static void setDestinationAirport(String airport)
{
driver.findElement(By.cssSelector("div.destinationAirport div.airportSelect")).click();
String xpath = "//div[contains(#class, 'destinationAirport')]//ul//li[contains(#class, 'custom-select-option') and contains(text(), '"
+ airport + "')]";
driver.findElement(By.xpath(xpath)).click();
}
and you call them like
driver.get("https://www.club18-30.com/club18-30");
setDepartureAirport("(MAN)");
setDestinationAirport("(IBZ)");
I would suggest that you use the 3-letter airport codes for your search, e.g. "(MAN)" for Manchester. That will be unique to each airport but you can use any unique part of the text.

`rtserver-id` turns to `rtserver - id` in java string

I have this code:
public void foo (){
String script =
"var aLocation = {};" +
"var aOffer = {};" +
"var aAdData = " +
"{ " +
"location: aLocation, " +
"offer: aOffer " +
" };" +
"var aClientEnv = " +
" { " +
" sessionid: \"\", " +
" cookie: \"\", " +
" rtserver-id: 1, " +
" lon: 34.847, " +
" lat: 32.123, " +
" venue: \"\", " +
" venue_context: \"\", " +
" source: \"\"," + // One of the following (string) values: ADS_PIN_INFO,
// ADS_0SPEED_INFO, ADS_LINE_SEARCH_INFO,
// ADS_ARROW_NEARBY_INFO, ADS_CATEGORY_AUTOCOMPLETE_INFO,
// ADS_HISTORY_LIST_INFO
// (this field is also called "channel")
" locale: \"\"" + // ISO639-1 language code (2-5 characters), supported formats:
" };" +
"W.setOffer(aAdData, aClientEnv);";
javascriptExecutor.executeScript(script);
}
I have two q:
when I debug and copy script value I see a member rtserver - id instead of rtserver-id
how can it be? the code throws an exception because of this.
Even if i remove this rtserver-id member (and there is not exception thrown)
I evaluate aLocation in this browser console and get "variable not defined". How can this be?

rtserver-id isn't a valid identifier - so if you want it as a field/property name, you need to quote it. You can see this in a Chrome Javascript console, with no need for any Java involved:
> var aClientEnv = { sessionId: "", rtserver-id: 1 };
Uncaught SyntaxError: Unexpected token -
> var aClientEnv = { sessionId: "", "rtserver-id": 1 };
undefined
> aClientEnv
Object {sessionId: "", rtserver-id: 1}
Basically I don't think anything's adding spaces - you've just got an invalid script. You can easily add the quotes in your Java code:
" \"rtserver-id\": 1, " +

How to get text between two Elements in DOM object?

I'm using JSoup to parse this HTML content:
<div class="submitted">
<strong><a title="View user profile." href="/user/1">user1</a></strong>
on 27/09/2011 - 15:17
<span class="via">www.google.com</span>
</div>
Which looks like this in web browser:
user1 on 27/09/2011 - 15:17 www.google.com
The username and the website can be parsed into variables using this:
String user = content.getElementsByClass("submitted").first().getElementsByTag("strong").first().text();
String website = content.getElementsByClass("submitted").first().getElementsByClass("via").first().text();
But I'm unsure of how to get the "on 27/09/2011 -15:17" into a variable, if I use
String date = content.getElementsByClass("submitted").first().text();
It also contains username and the website???

You can always remove the user and the website elements like this (you can clone your submitted element if you do not want the remove actions to "damage" your document):
public static void main(String[] args) throws Exception {
Document content = Jsoup.parse(
"<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\">www.google.com</span>" +
"</div> ");
// create a clone of the element so we do not destroy the original
Element submitted = content.getElementsByClass("submitted").first().clone();
// remove the elements that you do not need
submitted.getElementsByTag("strong").remove();
submitted.getElementsByClass("via").remove();
// print the result (demo)
System.out.println(submitted.text());
}
Outputs:
on 27/09/2011 - 15:17

You can then parse string that you get.
String str[] = contentString.split(" ");
Then you can construct the string you want like this:
String str = str[1] + " " + str[2] + " - " + str[4];
This will extract you the string you need.

Select the element before the text you wish to grab, then get its next sibling node (not element), which is a text node:
Document doc = Jsoup.parse("<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\">www.google.com</span>" +
"</div> ");
String str = doc.select("strong").first().nextSibling().toString().trim();
System.out.println(str);
You can also ask an element for its child text nodes and index directly (though referencing the nodes by sibling is usually more robust than indexing):
Document doc = Jsoup.parse(
"<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\">www.google.com</span>" +
"</div> ");
String str = doc.select("div").first().textNodes().get(1).text().trim();
System.out.println(str);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse html content for a value - java

Related

Parsing a specific text value with JSoup

Checking if a button is disabled or not

Unable to Select option from dropdown using JavascrtptExecutor

`rtserver-id` turns to `rtserver - id` in java string

How to get text between two Elements in DOM object?

Categories

Resources