Parsing a specific text value with JSoup - java

Hey does anyone know how to parse the "Light rain", " 7°C", and "Limited"? These are stored as #text so that's kind of throwing me off. For reference, to parse "Temperature:", it would be Element element5 = doc.select("strong").get(3);
Thanks!

The nodes from your example are called text nodes. In Jsoup, you can read the text nodes of a node by using the text() method. So given your example using Jsoup we'd select the td element and then use text() to get it's text value.
However, this would also output the text value from any child nodes, so in your case this would produce Weather: Light rain as a single string. Fortunately, Jsoup also has a ownText() method that only extracts the value from the text nodes that are a direct descendant of the element (and not all children). So given your example code, you could write it like this:
Element element5 = doc.select("td").get(3);
String value = element5.ownText()

You can use variuos ways to extract required text and one of them is td.childNode(1).toString() and complete solution is mentioned below:
public static void main(String[] args) {
// Parse HTML String using JSoup library
String HTMLSTring = "<html>\n" +
" <head></head>\n" +
" <body>\n" +
" <table class=\"table\"> \n" +
" <tbody>\n" +
" <tr> \n" +
" <td><strong>Weather: </strong>Light Rain</td> \n" +
" </tr> \n" +
" <tr> \n" +
" <td><strong>Tempratue: </strong>70 C</td> \n" +
" </tr> \n" +
" <tr> \n" +
" <td><strong>Visibility: </strong>Limited</td> \n" +
" </tr> \n" +
" <tr> \n" +
" <td><strong>Runs open: </strong>0</td> \n" +
" </tr>\n" +
" </tbody>\n" +
" </table>\n" +
" </body>\n" +
"</html>"
+ "<head></head>";
Document html = Jsoup.parse(HTMLSTring);
Elements tds = html.getElementsByTag("td");
for (Element td : tds) {
//String tdStrongText = td.childNode(0).childNodes().get(0).toString();
String tdStrongText = td.select("strong").text();
System.out.print(tdStrongText + " : ");
String tdText = td.childNode(1).toString();
System.out.println(tdText);
}
}
Check out code on github.

Related

Parse html content for a value

I receive a Http response after a call as Html String and I would like to scrape certain value stored inside the ReportViewer1 variable.
<html>
....................
...........
<script type="text/javascript">
var ReportViewer1 = new ReportViewer('ReportViewer1', 'ReportViewer1_ReportToolbar', 'ReportViewer1_ReportArea_WaitControl', 'ReportViewer1_ReportArea_ReportCell', 'ReportViewer1_ReportArea_PreviewFrame', 'ReportViewer1_ParametersAreaCell', 'ReportViewer1_ReportArea_ErrorControl', 'ReportViewer1_ReportArea_ErrorLabel', 'ReportViewer1_CP', '/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100', '', 'ReportViewer1_EditorPlaceholder', 'ReportViewer1_CalendarFrame', 'ReportViewer1_ReportArea_DocumentMapCell', {
CurrentPageToolTip: 'STR_TELERIK_MSG_CUR_PAGE_TOOL_TIP',
ExportButtonText: 'Export',
ExportToolTip: 'Export',
ExportSelectFormatText: 'Export to the selected format',
FirstPageToolTip: 'First page',
LabelOf: 'of',
LastPageToolTip: 'Last Page',
ProcessingReportMessage: 'Generating report...',
NoPageToDisplay: 'No page to display.',
NextPageToolTip: 'Next page',
ParametersToolTip: 'Click to close parameters area|Click to open parameters area',
DocumentMapToolTip: 'Hide document map|Show document map',
PreviousPageToolTip: 'Previous page',
TogglePageLayoutToolTip: 'Switch to interactive view|Switch to print preview',
SessionHasExpiredError: 'Session has expired.',
SessionHasExpiredMessage: 'Please, refresh the page.',
PrintToolTip: 'Print',
RefreshToolTip: 'Refresh',
NavigateBackToolTip: 'Navigate back',
NavigateForwardToolTip: 'Navigate forward',
ReportParametersSelectAllText: '<select all>',
ReportParametersSelectAValueText: '<select a value>',
ReportParametersInvalidValueText: 'Invalid value.',
ReportParametersNoValueText: 'Value required.',
ReportParametersNullText: 'NULL',
ReportParametersPreviewButtonText: 'Preview',
ReportParametersFalseValueLabel: 'False',
ReportParametersInputDataError: 'Missing or invalid parameter value. Please input valid data for all parameters.',
ReportParametersTrueValueLabel: 'True',
MissingReportSource: 'The source of the report definition has not been specified.',
ZoomToPageWidth: 'Page Width',
ZoomToWholePage: 'Full Page'
}, 'ReportViewer1_ReportArea_ReportArea', 'ReportViewer1_ReportArea_SplitterCell', 'ReportViewer1_ReportArea_DocumentMapCell', true, true, 'PDF', 'ReportViewer1_RSID', true);
</script>
...................
...................
</html>
The value is a90a0d41efa6429eadfefa42fc529de1 and this is in the middle of this content:
'/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100',
Whats the best way I can parse this value using Java?
Parse the HTML with String class
public class HtmlParser {
public static void main(String args[]){
String result = getValuesProp(html);
System.out.println("Result: "+ result);
}
static String PIVOT = "Telerik.ReportViewer.axd";
public static String getValuesProp(String json) {
String subString;
int i = json.indexOf(PIVOT);
i+= PIVOT.length();
//', chars
i+=2;
subString = json.substring(i);
i = subString.indexOf("'");
i++;
subString = subString.substring(i);
i = subString.indexOf("'");
subString = subString.substring(0,i);
return subString;
}
static String html ="<html>\n" +
"\n" +
"<script type=\"text/javascript\">\n" +
" var ReportViewer1 = new ReportViewer('ReportViewer1', 'ReportViewer1_ReportToolbar', 'ReportViewer1_ReportArea_WaitControl', 'ReportViewer1_ReportArea_ReportCell', 'ReportViewer1_ReportArea_PreviewFrame', 'ReportViewer1_ParametersAreaCell', 'ReportViewer1_ReportArea_ErrorControl', 'ReportViewer1_ReportArea_ErrorLabel', 'ReportViewer1_CP', '/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100', '', 'ReportViewer1_EditorPlaceholder', 'ReportViewer1_CalendarFrame', 'ReportViewer1_ReportArea_DocumentMapCell', {\n" +
" CurrentPageToolTip: 'STR_TELERIK_MSG_CUR_PAGE_TOOL_TIP',\n" +
" ExportButtonText: 'Export',\n" +
" ExportToolTip: 'Export',\n" +
" ExportSelectFormatText: 'Export to the selected format',\n" +
" FirstPageToolTip: 'First page',\n" +
" LabelOf: 'of',\n" +
" LastPageToolTip: 'Last Page',\n" +
" ProcessingReportMessage: 'Generating report...',\n" +
" NoPageToDisplay: 'No page to display.',\n" +
" NextPageToolTip: 'Next page',\n" +
" ParametersToolTip: 'Click to close parameters area|Click to open parameters area',\n" +
" DocumentMapToolTip: 'Hide document map|Show document map',\n" +
" PreviousPageToolTip: 'Previous page',\n" +
" TogglePageLayoutToolTip: 'Switch to interactive view|Switch to print preview',\n" +
" SessionHasExpiredError: 'Session has expired.',\n" +
" SessionHasExpiredMessage: 'Please, refresh the page.',\n" +
" PrintToolTip: 'Print',\n" +
" RefreshToolTip: 'Refresh',\n" +
" NavigateBackToolTip: 'Navigate back',\n" +
" NavigateForwardToolTip: 'Navigate forward',\n" +
" ReportParametersSelectAllText: '<select all>',\n" +
" ReportParametersSelectAValueText: '<select a value>',\n" +
" ReportParametersInvalidValueText: 'Invalid value.',\n" +
" ReportParametersNoValueText: 'Value required.',\n" +
" ReportParametersNullText: 'NULL',\n" +
" ReportParametersPreviewButtonText: 'Preview',\n" +
" ReportParametersFalseValueLabel: 'False',\n" +
" ReportParametersInputDataError: 'Missing or invalid parameter value. Please input valid data for all parameters.',\n" +
" ReportParametersTrueValueLabel: 'True',\n" +
" MissingReportSource: 'The source of the report definition has not been specified.',\n" +
" ZoomToPageWidth: 'Page Width',\n" +
" ZoomToWholePage: 'Full Page'\n" +
" }, 'ReportViewer1_ReportArea_ReportArea', 'ReportViewer1_ReportArea_SplitterCell', 'ReportViewer1_ReportArea_DocumentMapCell', true, true, 'PDF', 'ReportViewer1_RSID', true);\n" +
" </script>\n" +
"\n" +
"</html>";
}
I would read the text a line at a time like how most files are read. Because the format will always be the same, you look for a line that begins with the characters "var ReportViewer1." Then you know you have found the line you want. You may need to strip some white space, although it will always be formatted with the same whitespace too (up to you really.)
When you have the line, use the String .split() method to split that line into an array. There are nice delimiters there to split on ... "," or " " or ", " ... again, see what works best for you.
Test the split up line parts for '/app/Telerik.ReportViewer.axd' ... the next member of your split array will be the value you are looking for.
Again, the formatting will always be the same, so you can rely on that to find your variable. Of course, study the html text to make sure it does always follow the same format within the line you are investigating, but looking at it, I assume it probably does.
Again, find your line ... split it on a delimiter ... and use some logic to find the element you are after in the split up line parts.

Using Java API TransmissionWithRecipientArray object, how can I set an element like a key value array ( Sparkpost )

I'm sending emails using the Java API TransmissionWithRecipientArray object against a template. I'm facing some problems with the substitution data. I have test this data in the template editor but I don't know how to introduce that substitution data using TransmissionWithRecipientArray.
Here is a sample:
(...), "offers": [
{
"description": "dddddddddddddddddd.",
"discount": "ddddddd",
"image": "ddddddddddddddddddddd",
"image_announcer": "dddddddddddddddddddddddddddd",
"alt_title": "dddddddddddddddddddddd",
"tracking": "dhsdjkhsdjksdh",
"name": "sdhsdohdsiosd",
"id": "8480515",
"announcer_paid": "0",
"announcer_image": "test",
"announcer_alt_title": "wdiohdiowdhiowd"
},
{
"description": "dddddddddddddddddd.",
"discount": "ddddddd",
"image": "ddddddddddddddddddddd",
"image_announcer": "dddddddddddddddddddddddddddd",
"alt_title": "dddddddddddddddddddddd",
"tracking": "dhsdjkhsdjksdh",
"name": "sdhsdohdsiosd",
"id": "8480515",
"announcer_paid": "0",
"announcer_image": "test",
"announcer_alt_title": "wdiohdiowdhiowd"
}, (...)
In other words the question is: What should we introduce in the method setSubstitutionData() to get this input as substitution data? We have validated the substitution data using the template editor.
transmission.setSubstitutionData(allSubstitutionData.asJava)
Mandatory HTML:
{{offers[1].description}}
Per documentation, the way you loop through arrays in a template is:
{{ if offers }}
<ul>
{{ each offer }}
<li>Offer title is <b>{{ loop_var.name }}</b></li>
{{ end }}
</ul>
{{ end }}
you need to use the variable loop_var and if you pass an object in the array, that loop_var will be the root of your object. So if you want to print your discount field, you would need to write loop_var.discount.
There are lots of samples for how to do that sort of thing here.
For your specific case, I think you want something like this.
private void sendEmail(String from, String[] recipients) throws SparkPostException {
TransmissionWithRecipientArray transmission = new TransmissionWithRecipientArray();
// Populate Recipients
List<RecipientAttributes> recipientArray = new ArrayList<RecipientAttributes>();
for (String recipient : recipients) {
RecipientAttributes recipientAttribs = new RecipientAttributes();
recipientAttribs.setAddress(new AddressAttributes(recipient));
recipientArray.add(recipientAttribs);
}
transmission.setRecipientArray(recipientArray);
// Populate Substitution Data
Map<String, Object> substitutionData = new HashMap<String, Object>();
substitutionData.put("yourContent", "You can add substitution data too.");
transmission.setSubstitutionData(substitutionData);
// You can use Jackson, GSON or whatever you standard JSON decoding library is to
// Build this structure.
List<Map<String, String>> offers = new ArrayList<Map<String, String>>();
for (int i = 0; i < 2; i++) {
Map<String, String> offer = new HashMap<String, String>();
offer.put("description", "description value " + i);
offer.put("discount", "discount " + i);
offer.put("image", "image " + i);
offer.put("image_announcer", "image_announcer " + i);
offer.put("alt_title", "alt_title " + i);
offer.put("tracking", "tracking " + i);
offer.put("name", "name " + i);
offer.put("id", "id " + i);
offer.put("announcer_paid", "announcer_paid " + i);
offer.put("announcer_image", "announcer_image " + i);
offer.put("announcer_alt_title", "announcer_alt_title " + i);
offers.add(offer);
}
substitutionData.put("offers", offers);
// Populate Email Body
TemplateContentAttributes contentAttributes = new TemplateContentAttributes();
contentAttributes.setFrom(new AddressAttributes(from));
contentAttributes.setSubject("☰ Your subject content here. {{yourContent}}");
contentAttributes.setText("You could do it for text too. See https://www.sparkpost.com/blog/advanced-email-templates/ for an example");
contentAttributes.setHtml(
"<b>Your Data:</b><br>\n"
+ "<table border='1'>\n"
+ " <tr>\n"
+ " <th>description</th>\n"
+ " <th>discount</th>\n"
+ " <th>image</th>\n"
+ " <th>image_announcer</th>\n"
+ " <th>alt_title</th>\n"
+ " <th>tracking</th>\n"
+ " <th>name</th>\n"
+ " <th>id</th>\n"
+ " <th>announcer_paid</th>\n"
+ " <th>announcer_image</th>\n"
+ " <th>announcer_alt_title</th>\n"
+ " </tr>\n"
+ " {{each offers}} \n"
+ " <tr>\n"
+ " <td> {{{offers.description}}} </td>\n"
+ " <td> {{{offers.discount}}} </td>\n"
+ " <td> {{{offers.image}}} </td>\n"
+ " <td> {{{offers.image_announcer}}} </td>\n"
+ " <td> {{{offers.alt_title}}} </td>\n"
+ " <td> {{{offers.tracking}}} </td>\n"
+ " <td> {{{offers.name}}} </td>\n"
+ " <td> {{{offers.id}}} </td>\n"
+ " <td> {{{offers.announcer_paid}}} </td>\n"
+ " <td> {{{offers.announcer_image}}} </td>\n"
+ " <td> {{{offers.announcer_alt_title}}} </td>\n"
+ " </tr>\n"
+ " {{ end }} \n"
+ "</table>\n\n");
transmission.setContentAttributes(contentAttributes);
transmission.setContentAttributes(contentAttributes);
// Send the Email
IRestConnection connection = new RestConnection(this.client, getEndPoint());
Response response = ResourceTransmissions.create(connection, 0, transmission);
logger.debug("Transmission Response: " + response);
This is what the result looks like:
Thank you guys for your answers.
The issue we had here were with the conversion from a Scala Map type to Gson.
The result of processing with the Gson library HashMaps created from Scala Maps is different. Includes extra fields and changes the structure of the JSON.
The solution is this answer for Java users, and for Scala: iterate firstly all Maps converting to Java types like this:
def toJavaConverter(objectLevelSubs: immutable.Map[String, AnyRef]): java.util.LinkedHashMap[String, Object] = {
val output = new java.util.LinkedHashMap[java.lang.String, Object]
objectLevelSubs.foreach {
case (k: String, v: List[Predef.Map[String, AnyRef]]) => output.put(k, v.map(toJavaConverter))
case (k: String, v: Predef.Map[String, AnyRef]) => output.put(k, toJavaConverter(v))
case (k: String, v: AnyRef) => output.put(k, v)
}
output}
And finally converting each element like this.
val gson: Gson = new GsonBuilder().setPrettyPrinting().enableComplexMapKeySerialization().create()
val finalSubstitutionData: util.LinkedHashMap[String, AnyRef] = new util.LinkedHashMap[String, AnyRef]()
javaObjectLevelSubs.forEach{
case (k: String, v: String) => finalSubstitutionData.put(k, v)
case (k: String, a) => a match {case l: List[_] => finalSubstitutionData.put(k, l.map(gson.toJsonTree).asJava)}
}
Thanks #Yepher and #balexandre

Unable to Select option from dropdown using JavascrtptExecutor

Can anyone provide me a failsafe(ish) method for selecting text from dropdowns on this page I am practicing on?
https://www.club18-30.com/club18-30
Specifically, the 'from' and 'to' airport dropdowns. I am using the following code:
public void selectWhereFrom(String query, String whereFromSelect) throws InterruptedException {
WebElement dropDownContainer = driver.findElement(By.xpath(departureAirportLocator));
dropDownContainer.click();
selectOption(query,whereFromSelect);
}
public void selectOption(String query, String option) {
String script =
"function selectOption(s) {\r\n" +
" var sel = document.querySelector(' " + query + "');\r\n" +
" for (var i = 0; i < sel.options.length; i++)\r\n" +
" {\r\n" +
" if (sel.options[i].text.indexOf(s) > -1)\r\n" +
" {\r\n" +
" sel.options[i].selected = true;\r\n" +
" break;\r\n" +
" }\r\n" +
" }\r\n" +
"}\r\n" +
"return selectOption('" + option + "');";
javaScriptExecutor(script);
}
This seems to successfully populate the box with text but when I hit 'Search' I then receive a message saying I need to select an option, suggesting it has not registered the selection?
I would rather avoid JavaScriptExecutor but haven't been able to make these Selects work with a regular Selenium Select mechanism
I would set up a function for each dropdown, one for setting the departure airport and another for setting the destination airport. I've tested the code below and it works.
The functions
public static void setDepartureAirport(String airport)
{
driver.findElement(By.cssSelector("div.departureAirport div.departurePoint")).click();
String xpath = "//div[contains(#class, 'departurePoint')]//ul//li[contains(#class, 'custom-select-option') and contains(text(), '"
+ airport + "')]";
driver.findElement(By.xpath(xpath)).click();
}
public static void setDestinationAirport(String airport)
{
driver.findElement(By.cssSelector("div.destinationAirport div.airportSelect")).click();
String xpath = "//div[contains(#class, 'destinationAirport')]//ul//li[contains(#class, 'custom-select-option') and contains(text(), '"
+ airport + "')]";
driver.findElement(By.xpath(xpath)).click();
}
and you call them like
driver.get("https://www.club18-30.com/club18-30");
setDepartureAirport("(MAN)");
setDestinationAirport("(IBZ)");
I would suggest that you use the 3-letter airport codes for your search, e.g. "(MAN)" for Manchester. That will be unique to each airport but you can use any unique part of the text.

Drawing piechart in HTML email using Apache Commons Email

I would want to send statistical information to my clients showing the number of transactions processed on every terminal or branch. I am using Apache Commons Email to send HTML emails.
I would like to send a pie-chart data like this one on the site.
My java code is basic extracted from.
It goes like:
public void testHtmlEmailPiechart()
throws UnsupportedEncodingException, EmailException, MalformedURLException {
HtmlEmail email = new HtmlEmail();
email.setHostName(emailServer);
email.setSmtpPort(587);
email.setSSLOnConnect(true);
email.setAuthentication(userName, password);
email.setCharset(emailEncoding);
email.addTo(receiver, "Mwesigye John Bosco");
email.setFrom(userName, "Enovate system emailing alert");
email.setSubject("Conkev aml Engine Statistics");
URL url = new URL("https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcROXe8tn1ljtctM53TkLJhLs6gEX56CvL0shvyq1V6wg7tXUDH8KRyVP30");
// URL url = new URL("http://www.apache.org/images/asf_logo_wide.gif");
String cid2 = email.embed(url, "logo.gif");
email.setHtmlMsg("<html>\n" +
" <head>\n" +
" <script type=\"text/javascript\" src=\"https://www.gstatic.com/charts/loader.js\"></script>\n" +
" <script type=\"text/javascript\">\n" +
" google.charts.load(\"current\", {packages:[\"corechart\"]});\n" +
" google.charts.setOnLoadCallback(drawChart);\n" +
" function drawChart() {\n" +
" var data = google.visualization.arrayToDataTable([\n" +
" ['Task', 'Hours per Day'],\n" +
" ['Work', 11],\n" +
" ['Eat', 2],\n" +
" ['Commute', 2],\n" +
" ['Watch TV', 2],\n" +
" ['Sleep', 7]\n" +
" ]);\n" +
"\n" +
" var options = {\n" +
" title: 'My Daily Activities',\n" +
" is3D: true,\n" +
" };\n" +
"\n" +
" var chart = new google.visualization.PieChart(document.getElementById('piechart_3d'));\n" +
" chart.draw(data, options);\n" +
" }\n" +
" </script>\n" +
" </head>\n" +
" <body>\n" +
" <div id=\"piechart_3d\" style=\"width: 900px; height: 500px;\">Piechart Data</div>\n" +
" </body>\n" +
"</html>");
email.setTextMsg("Your email client does not support HTML messages");
email.send();
}
My guess is that the JavaScript is not recognized because the code works like sending images,styling fonts and I have sent to my email address some sample mail. I would like your help or recommendation of any material I can read to achieve this as long as am using Java.The processes is automated running in the background so no user interface is involved.
Thanks.

How to get text between two Elements in DOM object?

I'm using JSoup to parse this HTML content:
<div class="submitted">
<strong><a title="View user profile." href="/user/1">user1</a></strong>
on 27/09/2011 - 15:17
<span class="via">www.google.com</span>
</div>
Which looks like this in web browser:
user1 on 27/09/2011 - 15:17 www.google.com
The username and the website can be parsed into variables using this:
String user = content.getElementsByClass("submitted").first().getElementsByTag("strong").first().text();
String website = content.getElementsByClass("submitted").first().getElementsByClass("via").first().text();
But I'm unsure of how to get the "on 27/09/2011 -15:17" into a variable, if I use
String date = content.getElementsByClass("submitted").first().text();
It also contains username and the website???
You can always remove the user and the website elements like this (you can clone your submitted element if you do not want the remove actions to "damage" your document):
public static void main(String[] args) throws Exception {
Document content = Jsoup.parse(
"<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\">www.google.com</span>" +
"</div> ");
// create a clone of the element so we do not destroy the original
Element submitted = content.getElementsByClass("submitted").first().clone();
// remove the elements that you do not need
submitted.getElementsByTag("strong").remove();
submitted.getElementsByClass("via").remove();
// print the result (demo)
System.out.println(submitted.text());
}
Outputs:
on 27/09/2011 - 15:17
You can then parse string that you get.
String str[] = contentString.split(" ");
Then you can construct the string you want like this:
String str = str[1] + " " + str[2] + " - " + str[4];
This will extract you the string you need.
Select the element before the text you wish to grab, then get its next sibling node (not element), which is a text node:
Document doc = Jsoup.parse("<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\">www.google.com</span>" +
"</div> ");
String str = doc.select("strong").first().nextSibling().toString().trim();
System.out.println(str);
You can also ask an element for its child text nodes and index directly (though referencing the nodes by sibling is usually more robust than indexing):
Document doc = Jsoup.parse(
"<div class=\"submitted\">" +
" <strong><a title=\"View user profile.\" href=\"/user/1\">user1</a></strong>" +
" on 27/09/2011 - 15:17 " +
" <span class=\"via\">www.google.com</span>" +
"</div> ");
String str = doc.select("div").first().textNodes().get(1).text().trim();
System.out.println(str);

Categories

Resources