I'm making a small Android application for a class where I find cancer-related events from the American Cancer Society's website. I've been using JSoup to get basic information about the events, and to get specific information from the website I've tried to use the select() method. However, the current method that I'm using grabs way more HTML nodes than I would like and I couldn't figure out why. The table that I'm trying to grab looks like this:
EDIT: I realized that the where id = "pnlResults" does not end at that table, it ends after about 3 more tables, all with information that I would like to grab. Here is the table again
<div id="pnlResults">
<h2><span id="lblEventName">American Cancer Society 44th Annual Walter Hagen Golf Tournament</span></h2>
<!-- General Information Box -->
<div class="text-box boxed wide">
<h3 class="head" style="width:97%;">
General Information
</h3>
<div class="content">
<p>
<label>Event Times:</label><span id="lblStartDate">Monday, July 30, 2012</span><span id="lblEndDate"></span><br />
<label> </label><span id="lblStartTime">10:00 AM</span> - <span id="lblEndTime">9:00 PM</span>
</p>
<p>
<label>Time Zone:</label><span id="lblTimeZone">Eastern</span>
</p>
<p>
<label>Description:</label><span id="lblDesc" class="fieldData long">The American Cancer Society Walter Hagen Golf Tournament highlights the Society’s role in supporting research and patient care here in Rochester. Funds raised through this event help us make a difference in patents’ lives every day though programs including Road to Recovery and Patient Navigation as well as support grants to our research institutions. 144 golfers will play a round of golf and then enjoy cocktails, dinner, and silent auction following the tournament. </span>
</p>
<p>
<label>Agenda:</label><span id="lblAgenda" class="fieldData long">10:00am - Check-in, 11:00am - Lunch, 12:15pm - Shot gun start, 6:00 - Cocktails and silent auction, 7:00pm Dinner and program</span>
</p>
</div>
</div>
<div id="pnlStandardDisplay">
<!-- Event Location Box -->
<div class="text-box boxed wide line">
<h3 class="head" style="width:97%;">
Event Location
</h3>
<div class="content" style="display:inline-block; width:97%;">
<div >
<div id="mapOutsideContainer" class="resource-map">
<div id="map_canvas" class="resource-map" ></div>
</div>
<script type="text/javascript">
var mapDataPoints = [{ "lat":43.1075545,"lng":-77.5164518, "title":"Golf Event","content":"<b>American Cancer Society 44th Annual Walter Hagen Golf Tournament<\/b><br/><\/br>4045 East Avenue<br /><br/>Rochester, New York 14618<br /><br />Phone: <br />Fax: "} ];
buildMap(mapDataPoints, -5);
</script>
</div>
<h4><span id="lblLocationName">Irondequoit Country Club</span></h4>
<p>
<label>Address:</label><span id="lblAddress" class="fieldData" style="width:150px;">4045 East Avenue<br />Rochester, New York 14618</span>
</p>
<p>
<label nowrap="nowrap">Handicap Accessible:</label><span id="lblHandicapAccesible">Yes</span>
</p>
</div>
</div>
<!-- Primary Contact Box -->
<div class ="line" >
<div id="eventPrimaryContact_divContact" class="text-box boxed wide">
<h3 class="head" style="width:97%;">
Primary Contact
</h3>
<div class="content">
<p>
<label>Contact:</label><span id="eventPrimaryContact_lblContact">Katerina Kormas (Contact ACS for Details)</span>
</p>
<p>
<label>Contact Type:</label><span id="eventPrimaryContact_lblContactType">ACS Staff</span>
</p>
<p>
<label>Phone:</label><span id="eventPrimaryContact_lblContactPhone">(585) 288-1950</span>
</p>
<p>
<label>Additional Information:</label><span id="eventPrimaryContact_lblContactAddlInfo" class="fieldData long">Direct line is 585-224-4919 or cell 585-645-8912</span>
</p>
</div>
</div>
</div>
<!-- Registration Information Box -->
<div class="text-box boxed wide line">
<h3 class="head" style="width:97%;">
Registration Information
</h3>
<div class="content">
<p>
<label nowrap="nowrap">Registration Required?: </label><span id="lblRegRequired">Yes</span>
</p>
</div>
</div>
<!-- Event Cost Box -->
<div class ="line" >
<div id="eventCost_divCost" class="text-box boxed wide">
<h3 class="head" style="width:97%;">
Event Cost
</h3>
<div class="content">
<p>
<label>Cost/Registration Fee: </label><span id="eventCost_lblCostRegFee" class="fieldData long">$350 per golfer</span>
</p>
<p>
<label>Payment Type: </label><span id="eventCost_lblPaymentTypes" class="fieldData">Cash, Check, American Express, Mastercard, Visa, Discover</span>
</p>
<p>
<label>Check Payable To: </label><span id="eventCost_lblCheckPayable" class="fieldData">American Cancer Society</span>
</p>
<p>
<label>Memo Line: </label><span id="eventCost_lblCheckMemo" class="fieldData">American Cancer Society 44th Annual Walter Hagen Golf Tourna</span>
</p>
<p>
<label>Mail Check To:</label><span id="eventCost_lblCheckMailTo" class="fieldData">American Cancer Society<br />1120 South Goodman St<br />Rochester, New York 14620</span>
</p>
</div>
</div>
</div>
<!-- Tax Deduction Information Box -->
<div class="line">
<div class="text-box boxed wide">
<h3 class="head" style="width:97%;">
Tax Deduction Information
</h3>
<div class="content">
<p>
$210 per golfer is tax deductible
</p>
</div>
</div>
</div>
</div> <!-- end standard display -->
<!-- end daffodil display -->
EDIT: Given these new tables, I would like to extract the General Information, and Event location. How would I go about doing that? Maybe using the subset of select I just got to select again Where the headers are what I want?
The code where I'm using the select() is shown below. As I said before, I tried to use
select("div[id=pnlResults]);
but the returned data is much more than just the div where the id is pnlResults.
public ArrayList<Event> results()
{
ArrayList<Event> results = new ArrayList<Event>();
Document doc = Jsoup.parse(page);
Elements links = doc.select("a[href*=event-details]");
for(Element e: links)
{
String title = e.text();
String link = "http://www.cancer.org/involved/participate/app/"+e.attr("href");
try{
Document eventInfo = Jsoup.connect(link).get();
Elements info = eventInfo.select("div[id*=pnlResults");
}
catch(MalformedURLException exception)
{
exception.printStackTrace();
}
catch(IOException exception)
{
exception.printStackTrace();
}
}
return results;
}
Any help would be greatly appreciated.
Try:
Elements info = eventInfo.select("div#pnlResults");
Update for your update:
Since you now have more data, and since the HTML itself isn't that great you'll just have to work through it to pick out your data. If the content you need all have id values then use the id attribute of those elements to get the text.
If you want to get content of the div with id "pnlResults", JSoup provide method getElementById.
For example, if you want get that content and put it in string, you can do it like this:
Document document = Jsoup.connect(LINK_TO_WEBSITE).get();
String content = document.getElementById("pnlResults").outerHtml();
Then, you can put this content in Android's WebView, and it will work nice.
Hope this will help someone!
This worked for me:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class DivStuff {
public static final String MY_PAGE = "http://www.cancer.org/Involved/Participate/app" +
"/event-search.aspx?zip=28590&city=&state=&local-radius=20&textsrch=&startdate=" +
"11%2F13%2F2011&enddate=&all=1";
private static final String[] HEADINGS = {"Event", "Location", "City, State", "Date", "Distance"};
private String page;
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect(MY_PAGE).get();
Elements links = doc.select("table");
Elements links2 = links.select("tr");
if (links2.size() < 2) {
return;
}
for (int i = 1; i < links2.size(); i++) {
Elements innerDetails = links2.get(i).select("td");
if (innerDetails.size() != 5) {
break;
}
for (int j = 0; j < HEADINGS.length; j++) {
System.out.print(HEADINGS[j] + ": ");
if (j == 0) {
System.out.println(innerDetails.get(j).select("a").get(0).text());
} else {
System.out.println(innerDetails.get(j).text());
}
}
System.out.println();
}
}
}
Related
Hi guys I'm using jsoup in a java webapplication on IntelliJ. I'm trying to scrape data of port call events from a shiptracking website and store the data in a mySQL database. The data for the events is organised in divs with the class name table-group and the values are in another div with the class name table-row. My problem is the divs rows for all the vessel are all the same class name and im trying to loop through each row and push the data to a database. So far i have managed to create a java class to scrape the first row. How can i loop through each row and store those values to my database. Should i create an array list to store the values?
this is my scraper class
public class Scarper {
private static Document doc;
public static void main(String[] args) {
final String url =
"https://www.myshiptracking.com/ports-arrivals-departures/?mmsi=&pid=277&type=0&time=&pp=20";
try {
doc = Jsoup.connect(url).get();
} catch (IOException e) {
e.printStackTrace();
}
Events();
}
public static void Events() {
Elements elm = doc.select("div.table-group:nth-of-type(2) > .table-row");
List<String> arrayList = new ArrayList();
for (Element ele : elm) {
String event = ele.select("div.col:nth-of-type(2)").text();
String time = ele.select("div.col:nth-of-type(3)").text();
String port = ele.select("div.col:nth-of-type(4)").text();
String vessel = ele.select(".td_vesseltype.col").text();
Event ev = new Event();
System.out.println(event);
System.out.println(time);
System.out.println(port);
System.out.println(vessel);
}
}
}
sample of the div classes i want to scrape
<div style="box-sizing: border-box;padding: 0px 10px 10px 10px;">
<div class="cs-table">
<div class="heading">
<div class="col" style="width: 10px"></div>
<div class="col" style="width: 110px">Event</div>
<div class="col" style="width: 120px">Time (<span class="tooltip" title="My Time: In your current TimeZone">MT</span>)</div>
<div class="col" style="width: 150px">Port</div>
<div class="col">Vessel</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/>BELFAST</div>
<div class="col td_vesseltype"><img src="/icons/icon7_511.png"><span class="padding_18">WILSON BLYTH [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-flag-checkered green"></i></div>
<div class="col">Arrival</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/>HUNTERS QUAY</div>
<div class="col td_vesseltype"><img src="/icons/icon6_511.png"><span class="padding_18">SOUND OF SOAY [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/>LARGS</div>
<div class="col td_vesseltype"><img src="/icons/icon6_511.png"><span class="padding_18">LOCH SHIRA [GB]</span></div>
</div>
</div>
<div class="table-group">
<div class="table-row">
<div class="col"><i class="fa fa-sign-out red"></i></div>
<div class="col">Departure</div>
<div class="col" style="text-align: center;">2022-02-14 <b>16:51</b></div>
<div class="col"><img class="flag_line tooltip" src="/icons/flags2/16/GB.png" title=" United Kingdom"/>RYDE</div>
<div class="col td_vesseltype"><img src="/icons/icon4_511.png"><span class="padding_18">ISLAND FLYER [GB]</span></div>
</div>
</div>
You can start with looping over the table's rows: the selector for the table is .cs-table so you can get the table with Element table = doc.select(".cs-table").first();. Next you can get the table's rows with the selector div.table-row - Elements rows = doc.select("div.table-row"); now you can loop over all the rows and extract the data from each row. The code should look like:
Element table = doc.select(".cs-table").first();
Elements rows = doc.select("div.table-row");
for (Element row : rows) {
String event = row.select("div.col:nth-of-type(2)").text();
String time = row.select("div.col:nth-of-type(3)").text();
String port = row.select("div.col:nth-of-type(4)").text();
String vessel = row.select(".td_vesseltype.col").text();
System.out.println(event + "-" + time + " " + port + " " + vessel);
System.out.println("---------------------------");
// Do stuff with data here
}
Now it's up to you to decide if you want to keep the data in some array/list inside the loop and use it later, or to insert it directly to your database.
I'm trying to ignore an item and not parse it on Jsoup
But css selector "not", not working !!
I don't understand what is wrong ??
my code:
MangaList list = new MangaList();
Document document = getPage("https://3asq.org/");
MangaInfo manga;
for (Element o : document.select("div.page-item-detail:not(.item-thumb#manga-item-5520)")) {
manga = new MangaInfo();
manga.name = o.select("h3").first().select("a").last().text();
manga.path = o.select("a").first().attr("href");
try {
manga.preview = o.select("img").first().attr("src");
} catch (Exception e) {
manga.preview = "";
}
list.add(manga);
}
return list;
html code:
<div class="col-12 col-md-6 badge-pos-1">
<div class="page-item-detail manga">
<div id="manga-item-5520" class="item-thumb hover-details c-image-hover" data-post-id="5520">
<a href="https://3asq.org/manga/gosu/" title="Gosu">
<img width="110" height="150" src="https://3asq.org/wp-content/uploads/2020/03/IMG_4497-110x150.jpg" srcset="https://3asq.org/wp-content/uploads/2020/03/IMG_4497-110x150.jpg 110w, https://3asq.org/wp-content/uploads/2020/03/IMG_4497-175x238.jpg 175w" sizes="(max-width: 110px) 100vw, 110px" class="img-responsive" style="" alt="IMG_4497"/> </a>
</div>
<div class="item-summary">
<div class="post-title font-title">
<h3 class="h5">
<span class="manga-title-badges custom noal-manga">Noal-Manga</span> Gosu
</h3>
If I debug your code and extract the HTML for:
System.out.println(document.select("div.page-item-detail").get(0)) (hint use the expression evaluator in IntelliJ IDEA (Alt+F8 - for in-session, real-time debugging)
I get:
<div class="page-item-detail manga">
<div id="manga-item-2003" class="item-thumb hover-details c-image-hover" data-post-id="2003">
<a href="http...
...
</div>
</div>
</div>
It looks like you want to extract the next div tag down with class containing item-thumb ... but only if the id isn't manga-item-5520.
So here's what I did to remove that one item
document.select("div.page-item-detail div[class*=item-thumb][id!=manga-item-5520]")
Result size: 19
With the element included:
document.select("div.page-item-detail div[class*=item-thumb]")
Result size: 20
You can also try the following if you want to remain based at the outer div tag rather than the inner div tag.
document.select("div.page-item-detail:has(div[class*=item-thumb][id!=manga-item-5520])")
Good Morning,
I am developing a system which can accept multiple object at the same time, technology used are:
Spring 5.2.5.RELEASE For the backend,
Jsp for Front layer.
The main goal is to let the user put 16 (predefined number, so it is not dynamic the lenght of the array) records of a "Delta P" data with only one form.
As the "nome_operatore" and "data_operazione_delta_p" are equal in all the records, they are submitted only ones and then replicated in the controller for all records submitted.
Since now I came out with the subsequent classes following various tutorials here on SO and over the net.
In general, the view is correctly displayed (I scanned the code generated by Spring and it is correct as far as I know) and the GetMapping to display the form page works correctly as well (I debugged to see if some data were not correct, but I found no errors).
The only problem is that when I submit the form, the page freezes and after a while Chrome display an advise saying that is impossibile to load the page.
The server is still running, as the server keeps logging correctly, but the line
logger.info("submitted form to create multiple Delta P data");
is never reached.
No errors are displayed either in Chrome console.
If this is not the correct way to upload multiple items at one time, how this could be done in Spring 5?
EDIT
After investigation I found chrome giving a RESULT_CODE_HUNG error, but on the net I found nothing useful to fix it, only people complaining about "chrome killing pages", can someone explain what this error means at least? I tried to document myself but with no success. The same error shows up also in Edge and Firefox.
List wrapper
package com.entsorgafin.dto;
import com.entsorgafin.model.Dato_delta_p;
import java.util.ArrayList;
import java.util.List;
public class DeltaPListWrapper
{
private List<Dato_delta_p> deltaPList;
public DeltaPListWrapper()
{
this.deltaPList = new ArrayList<>();
}
public List<Dato_delta_p> getDeltaPList()
{
return deltaPList;
}
public void setDeltaPList(List<Dato_delta_p> deltaPList)
{
this.deltaPList = deltaPList;
}
public void add(Dato_delta_p dato_delta_p)
{
this.deltaPList.add(dato_delta_p);
}
}
Controller methods
/**
* Shows the form to insert a new delta p data series in the system.
* <p>
* Returns the form page.
*
* #param model ModelMap of the UI
* #return The form page to insert one record for each sector
*/
#GetMapping("/addDeltaP")
public String addDeltaP(ModelMap model)
{
logger.info("adding Delta P data");
logger.debug("finding infos for front end representation");
//finding users to relate the records with
List<Utente> users = utentiService.findAllUsers();
logger.debug("found " + users.size() + " users");
Map<Integer, String> userForFE = new HashMap<>();
for(Utente utente : users)
{
userForFE.put(utente.getId_utente(), utente.getNome() + " " + utente.getCognome());
}
model.addAttribute("users", userForFE);
//finding active sectors
List<Settore> activeSectors = new ArrayList<>();
activeSectors.addAll(settoriService.findActiveSectorForPhase("act"));
activeSectors.addAll(settoriService.findActiveSectorForPhase("cur"));
logger.debug("found " + activeSectors.size() + " active sectors");
//creating wrapper which contains multiple Delta P records
DeltaPListWrapper listWrapper = new DeltaPListWrapper();
//Pre-filling sector field for delta P data
for(Settore sect : activeSectors)
{
Dato_delta_p dato_delta_p = new Dato_delta_p();
dato_delta_p.setSettore(sect);
listWrapper.add(dato_delta_p);
}
model.addAttribute("deltaPData", listWrapper);
model.addAttribute("activeSectorNumber", activeSectors.size());
return "uploadDeltaPData";
}
/**
* Saves a new series of data record in the database.
*
* #param listWrapper List of Delta p data to create
* #return Returns the homepage
*/
#PostMapping("/addDeltaP")
public String addDeltaP(#ModelAttribute("deltaPData") DeltaPListWrapper listWrapper)
{
logger.info("submitted form to create multiple Delta P data");
/*
getting Date of the first record, operations are performed on the same date, so
every record will have the same property for data_operazione
The same stands for the user who performed the operations
*/
LocalDate dataOperazione = listWrapper.getDeltaPList().get(0).getData_operazione_delta_p();
Utente idUtente = listWrapper.getDeltaPList().get(0).getUtente_id_utente();
/*
Filling delta P data with active batch for the sector they are from
*/
for(Dato_delta_p dato_delta_p : listWrapper.getDeltaPList())
{
dato_delta_p.setUtente_id_utente(idUtente);
dato_delta_p.setData_operazione_delta_p(dataOperazione);
String phase;
Settore used = settoriService.findSectorById(dato_delta_p.getSettore().getId_settore());
if(used.getFase().equals("act"))
{
phase = "act";
} else
{
phase = "cur";
}
Lotto referencedLotto = lottoService
.findActiveBatchInSectorAndDate(dato_delta_p.getData_operazione_delta_p(), dato_delta_p
.getSettore(), phase);
logger.debug("found Lotto with ID " + referencedLotto.getId_lotto() + " for Delta P record");
dato_delta_p.setLotto_id_lotto(referencedLotto);
//creating the data
dati_delta_pService.createDeltaPData(dato_delta_p);
logger.info("Delta P data created correctly");
}
return "redirect:/entsorgafin/home";
}
JSP view
<%# page contentType="text/html;charset=UTF-8" language="java" %>
<%# taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<%# taglib prefix="form" uri="http://www.springframework.org/tags/form" %>
<jsp:include page="header.jsp"/>
<div class="container">
<h1>Inserisci i dati delta P</h1>
<form:form method="post" modelAttribute="deltaPData" onsubmit="enableFields(${activeSectorNumber})">
<div class="row">
<div class="form-group col">
<label for="utente_id_utente">Nome dell'operatore</label>
<form:select class="form-control" id="utente_id_utente" path="${deltaPList[0].utente_id_utente.id_utente}" required="required">
<form:option value="" />
<form:options items="${users}" />
</form:select>
</div>
<div class="form-group col">
<label for="data_operazione_delta_p">Data dell'operazione</label>
<form:input path="${deltaPList[0].data_operazione_delta_p}" type="date" class="form-control" id="data_operazione_delta_p" required="required" />
</div>
</div>
<div class="row text-center">
<div class="col my-auto">Numero del settore</div>
<div class="col my-auto">Valore deltaP rilevato</div>
<div class="col my-auto">Velocità ventilatore</div>
<div class="col my-auto">Settore pieno</div>
<div class="col my-auto">Settore in caricamento</div>
</div>
<c:forEach items="${deltaPData.deltaPList}" varStatus="i">
<form:input path="deltaPList[${i.index}].id_dato_delta_p" type="hidden" id="id_dati_delta_p" />
<div class="row text-center">
<div class="form-group col">
<form:input path="deltaPList[${i.index}].settore.id_settore" class="form-control text-center" id="settore${i.index}" required="required" disabled="true"/>
</div>
<div class="form-group col">
<form:input path="deltaPList[${i.index}].valore_delta_p" type="number" step="0.01" class="form-control" id="valore_delta_p" required="required" />
</div>
<div class="form-group col">
<form:input path="deltaPList[${i.index}].velocita_ventilatore" type="number" step="0.01" class="form-control" id="velocita_ventilatore" required="required" />
</div>
<div class="form-group col my-auto">
<form:checkbox path="deltaPList[${i.index}].stato_settore_pieno" id="stato_settore_pieno${i.index}" value="true" onclick="disableSettoreCaricoBox(${i.index})"/>
</div>
<div class="form-group col my-auto">
<form:checkbox path="deltaPList[${i.index}].stato_settore_carico" id="stato_settore_carico${i.index}" value="true" onclick="disableSettorePienoBox(${i.index})"/>
</div>
</div>
</c:forEach>
<input type="submit" value="create" class="btn btn-primary btn-sm">
</form:form>
</div>
<jsp:include page="footer.jsp"/>
<script>
function disableSettoreCaricoBox(i)
{
const checked = document.getElementById('stato_settore_pieno' + i).checked;
document.getElementById("stato_settore_carico" + i).disabled = !!checked;
}
function disableSettorePienoBox(i)
{
const checked = document.getElementById('stato_settore_carico' + i).checked;
document.getElementById("stato_settore_pieno" + i).disabled = !!checked;
}
function enableFields(i)
{
for(let x = 0; x < i; x++)
{
document.getElementById("settore" + x).disabled = false;
}
}
</script>
I actually feel kind of an idiot for not noticing it before, after focusing on Spring possible errors I lost a minor point in the code where the error actually is: Javascript function.
function enableFields(i)
{
for(let x = 0; x < i; i++)
{
document.getElementById("settore" + x).disabled = false;
}
}
should be
function enableFields(i)
{
for(let x = 0; x < i; x++)
{
document.getElementById("settore" + x).disabled = false;
}
}
I'll correct the quetion code, hope this could anyway help someone who's searching an example of multi row submitting in Spring 5.
I want to extract some data from many links from xbox. The problem I am experiencing is that in the section where the price is shown, the structure is different if the game is with discount (for example).
The code I have written to scrap the price:
String urlPage = "https://www.microsoft.com/en-us/store/p/call-of-duty-advanced-warfare-gold-edition/c20hl06x0v8w" ;
System.out.println("Comprobando entradas de: "+urlPage);
if (getStatusConnectionCode(urlPage) == 200) {
Document document = getHtmlDocument(urlPage);
Elements entradas = document.select("div.m-product-detail-hero-product-placement div.price-info");
for (Element elem : entradas) {
String titulo = elem.getElementsByClass("srv_saleprice").text();
}
}else{
System.out.println("El Status Code no es OK es: "+getStatusConnectionCode(urlPage));
}
The HTML for a game that has no discount:
URL for first case
<div class="price-info">
<div class="c-price">
<div class="price-text srv_price">
<div class="ea-vault-message hidden x-hidden">
<div>
Available in The Vault
</div>
<div>
or
</div>
</div>
<span>$59.99</span>
<sup>+</sup>
</div>
<div class="srv_microdata" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<meta itemprop="price" content="59.99">
<meta itemprop="priceCurrency" content="USD">
</div>
</div>
</div>
And for a game with discount:
URL for the second case
<div class="price-info">
<div class="c-price">
<div class="price-text srv_price">
<div class="ea-vault-message hidden x-hidden">
<div>
Available in The Vault
</div>
<div>
or
</div>
</div>
<s class="srv_saleprice" aria-label="Full price was $159.99">$159.99</s>
<span> </span>
<div class="price-disclaimer">
<span>$135.99</span>
<sup>+</sup>
</div>
<span> </span>
<span></span>
</div>
<div class="caption text-muted srv_countdown">
<span class="sub">save $24.00</span>
</div>
<div class="srv_microdata" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<meta itemprop="price" content="135.99">
<meta itemprop="priceCurrency" content="USD">
</div>
</div>
</div>
In this second example the value inside elements is $135.99 but is not the game base price ($159.99 in this case).
How could I extract only the base price for every game (with or without) discount?
I have some Element eNews. After finding indexes by CssQuery I have to select sibling elements with index less than y and greater than x;
Elements lines = eNews.select("div.clear");
int x = lines.get(0).elementSiblingIndex();
int y = lines.get(1).elementSiblingIndex();
Elements tNews = eNews.getElementsByIndexGreaterThan(x)
?AND?
eNews.getElementsByIndexLessThan(y)
This is some sample code. I want to extract text from html tags between first and second <div class="clear></div>
<div class="aktualnosci">
<div class="zd">
<a href="/Data/Thumbs/ODAweDYwMA,dsc_0458.jpg" title="" rel="lightbox">
<img src="/Data/Thumbs/dsc_0458.jpg"/>
</a>
<p class="show"></p>
</div>
<h3>Awanse</h3>
<div class="data">
<img alt="" src="/Themes/kalendarz-ico.gif">
2013-11-18 12:26
</div>
<!--Start tag-->
<div class="clear"></div>
<!--Tags to extract-->
<p class="gr">W związku z Narodowym Świętem Niepodległości ....</p>
<p style="text-align: justify">W zeszły p....</p>
<p style="text-align: justify">OISW Kraków</p>
<!--End tag-->
<div class="clear"></div>
<div class="slider">
<span class="slide-left"></span>
<span class="slide-right"></span>
</div>
</div>
You can use a selector like div.clear ~ :gt(1):lt(4)
E.g.:
Elements tNews = eNews.select("div.clear ~ :gt(1):lt(4)");
See this example and the selector docs. (It's a bit hard to validate this does what you're trying to achieve without knowing your input HTML and the data you're trying to extract.)
Update based on your edit: there are a couple ways to do this if you can't know the indexes in advance. Below I get the first div, then accumulate sibling elements until we hit the next div.clear. (I'll have a think if I can generify this pattern and add it to jsoup.)
Document doc = Jsoup.parse(h);
Element firstDiv = doc.select("div.clear").first();
Elements news = new Elements();
Element item = firstDiv.nextElementSibling();
while (item != null && !(item.tagName().equals("div") && item.className().equals("clear"))) {
news.add(item);
item = item.nextElementSibling();
}
System.out.println(String.format("Found %s items", news.size()));
for (Element element : news) {
System.out.println(element.text());
}
Outputs:
Found 3 items
W związku z Narodowym Świętem Niepodległości ....
W zeszły p....
OISW Kraków