Extract data from multiple classes in selenium using java - java

The objective is to extract reviews from E-com website.How should i proceed to extract data from multiple classes using Selenium and then applying a for loop.Do i have to create an xpath with all the classes if yes how the syntax should be.There are few classes which contains data in string format and integers.
[Flipkart Reviews - class details]
class="_2xg6Ul" Brilliant
class="qwjRop" Best camera in smartphone period. Have note 8 and iPhone X also but pixel 2 with single lens beats them hands down
class= "_3LYOAd _3sxSiS" Flipkart Customer
class="_3LYOAd" 29 Nov, 2017
class="_1_BQL8" 142

Based on the very, very limited information you provided, this is what I came up with. You'll have to provide more information like the code you already have as well as the full HTML for each element.
List<WebElement> list = driver.findElements(By.xpath("//body[contains(#class, '_')]"));
//Iterate through list
for(int i =0;i<list.size();i++) {
WebElement review = list.get(i);
System.out.println(review.getText());
}

Related

Scrape links from a lst in scrapy OR create a loop?

I want to scrape this website: https://www.racingpost.com/results for the results.
I already have a crawler that scrapes and follows the links on the results page - but i can not go further back than the 6 or seven days that are displayed on the site. The older results are aviable via the "resultsfinder", which is sadly java script, as are other sources of the older races like the form of the horses.
I already tried to learn to scrape java to get the links, and while it is very interesting, I am wondering if there is not an easier way, as the result page adresses are designed in a very convinient way:
Its simply https://www.racingpost.com/results/ + something like 1990-02-08 or 2021-02-11 or any other date.
So I thought it might be easier to design the spider to scrape to get its links from a loop or predefined list of links.
How could I design a loop that runs through 1990-01-01 up to now in scrapy or is it better to create a predefined list of links for this?
Generate the dates in the spider and append them to the link, no need to create a predefined list of links.
from datetime import date, timedelta
# Initialize variables
start_date = date(1990, 1, 1)
end_date = date.today()
crawl_date = start_date
base_url = "https://www.racingpost.com/results/"
links = []
# Generate the links
while crawl_date <= end_date:
links.append(base_url + str(crawl_date))
crawl_date += timedelta(days=1)
Then loop through the generated list, or alternatively just call the parse function from the while-loop instead of adding the links to a list.
Example results:
>>> links
[
"https://www.racingpost.com/results/1990-01-01",
"https://www.racingpost.com/results/1990-01-02",
"https://www.racingpost.com/results/1990-01-03",
"https://www.racingpost.com/results/1990-01-04",
"https://www.racingpost.com/results/1990-01-05",
...
]

pmml model created from xgboost in R leads to different result than original model in R

I have a ranking task, where my training data looks like this:
session_id item_id item_features target
---------------------------------------------
session1 item1 ... 1
session1 item2 ... 0
...
sessionN item1 ... 0
sessionN itemX ... 10
sessionN itemY ... 0
...
I am using xgboost in R with the objective "rank:pairwise" for training the model. xgboost expects grouped data (same session_id) to be bunched together in the training and test sets. The lines belonging to the same session_id have to be specified using the function setinfo() (e. g. setinfo(model, 'group', group_info).
When I evaluate the model in R, applying new data works perfectly. However, I have used the package pmml to convert the model into a pmml file in order to use it in Java.
In Java the pmml file gets parsed and evaluated via the org.jpmml pmml-evaluator dependency (v. 1.3.15). Feeding the same data as in R to the org.jpmml.evaluator.Evaluator yields different results, though. The results are mostly negative values - which is no valid result in my setup- all predicted targets should be positive.
I have come up with two possible explanations:
There might be a bug in the pmml conversion in my scenario
I have no idea, where I can apply the equivalent of setinfo() in Java. Since I am only applying the model to a single session at a time, I was under the impression that I did not need to specify it. But maybe, I was wrong.
Please contact me for fully working example including training and test data, I will send via mail. But for starters, here is the R code from training the model:
library(xgboost)
example_matrix_train <- xgb.DMatrix(X, label = y)
setinfo(example_matrix_train, 'group', example_train_groupInfo)
example.model <- xgboost(data = example_matrix_train, objective = "rank:pairwise", max.depth = 8, eta = 0.2, nthread = 8, nround = 10, verbose=0)
library(pmml)
library(pmmlTransformations)
xgb.dump(example.model, "example.model.dumped.trees")
logfile <- file(paste0("pmml_example_model",Sys.Date(),".txt"), open="a")
sink(logfile)
pmml(example.model, inputFeatureNames = colnames(example_train), outputLabelName = "prediction1", xgbDumpFile = "example.model.dumped.trees")
sink()
Any help is welcome
I have come up with two possible explanations: There might be a bug in the pmml conversion
This is the true explanation - the pmml package is producing incorrect PMML for XGBoost models. The technical reason is that it is using XGBoost text dump file as input, but the information contained therein is incomplete (eg. rounded threshold values).
If you're looking to export XGBoost models into PMML, then you should be using the r2pmml package, which is using XGBoost binary files as input.
In truth, the 'pmml' package currently does not support the 'rank:pairwise' objective function you need. The upcoming release of the 'pmml' package (version 1.5.3) includes a check for unsupported objective functions.

jcrfsuite training file format

From what I understand from the example of POS Tagging given in the examples of jcrfsuite. The training file is tab separated and first token is the label. But I do not get the BigCluster| thing. Can somebody help me with how to specify tokens in training file.
Example below:
O BigCluster|00 BigCluster|0000 BigCluster|000000 BigCluster|00000000 BigCluster|0000000000 BigCluster|000000000000 BigCluster|00000000000000 BigCluster|0000000000000000 NextBigCluster|0100 NextBigCluster|01000101 NextBigCluster|010001011111 POSTagDict|D POSTagDict|N POSTagDict|^ POSTagDict|$ POSTagDict|G NextPOSTag|V 1gramSuff|i 1gramPref|i prevword| prevcurr||i nextword|predict nextword|predict currnext|i|predict Word|I Lower|i Xxdshape|X charclass|1, first-shortcap prevnext||predict t=0
Test file format:
! BigCluster|01 BigCluster|0110 BigCluster|011011 BigCluster|01101100 BigCluster|0110110011 BigCluster|011011001100 BigCluster|01101100110000 BigCluster|0110110011000000 NextBigCluster|1000 NextBigCluster|10001000 NextBigCluster|100010000000 POSTagDict|V NextPOSTag|, metaph_POSDict|N 1gramSuff|n 2gramSuff|nn 3gramSuff|mnn 4gramSuff|mmnn 5gramSuff|mmmnn 6gramSuff|ammmnn 7gramSuff|aammmnn 8gramSuff|aaammmnn 9gramSuff|daaammmnn 1gramPref|d 2gramPref|da 3gramPref|daa 4gramPref|daaa 5gramPref|daaam 6gramPref|daaamm 7gramPref|daaammm 8gramPref|daaammmn 9gramPref|daaammmnn prevword| prevcurr||daaammmnn nextword|. nextword|. currnext|daaammmnn|. Word|Daaammmnn Lower|daaammmnn Xxdshape|Xxxxxxxxx charclass|1,2,2,2,2,2,2,2,2, first-initcap prevnext||. t=0
What is specified after the label is a list of feature-name and feature-value.
It is in a sparse representation instead of tabular representation.
BigCluster is just one of the features and it's relevant to the specific example only. You should create your own features if you are training from scratch.
I have noticed that CRFsuite does not care for the naming convention nor feature design of labels and attributes, because treats them as strings.
CRFsuite learns weights of associations (feature weights) between attributes and labels, without knowing the meaning of labels and attributes. In other words, one can design and use arbitrary features just by writing label and attribute names in data sets, just find the best posible attributes for your example and run some experiments with different sets of attributes and features. And you will good to go.

How to read and display particular xml data in android

I just want know which game is play.
one url is -> wwww.something.com/data.xml is
<?xml version="1.0" encoding="UTF-8"?>
<Cricket>
<WC Group="A" Day="Sunday">
<DayMatch>Aus Vs Ind</DayMatch>
<NightMatch>Ban Vs S A</Night>
</WC>
<WC Group="A" Day="Monday">
<DayMatch>Ind Vs Ban</DayMatch>
<NightMatch>Aus Vs S A</NightMatch>
</WC>
<WC Group="B" Day="Sunday">
<DayMatch>Eng VS NZ</DayMatch>
<NightMatch>Pak Vs Zim</NightMatch>
</WC>
<WC Group="B" Day="Monday">
<DayMatch>Pak VS Eng</DayMatch>
<NightMatch>Zim Vs NZ </NightMatch>
</WC>
</Cricket>
Now, i have give input group A and Day Mondday, then i want ouput Group A and monday full day and night game fixture this:
Ind Vs Ban
Aus Vs S A
Would you give any idea. How to access group and day value in xml.
declare id for each group. As per id and group name, you can write conditions on code.
Parse this file with SAX Parser and put it in a list of WC model that you have to create like this List<WC> then search in this list

Html Text Extraction in j2me

I've a String from html web page like this:
String htmlString =
<span style="mso-bidi-font-family:Gautami;mso-bidi-theme-font:minor-bidi">President Pranab pay great
tributes to Motilal Nehru on occasion of
</span>
150th birth anniversary. Pranab said institutions evolved by
leaders like him should be strengthened instead of being destroyed.
<span style="mso-spacerun:yes">
</span>
He listed his achievements like his role in evolving of Public Accounts Committee and protecting independence of
Legislature from the influence of the Executive by establishing a separate cadre for the Central Legislative Assembly,
the first set of coins and postal stamps released at the function to commemorate the event.
</p>
i need to extract the text from above String ,after extraction my out put should look like
OutPut:
President Pranab pay great tributes to Motilal Nehru on occasion of 150th birth anniversary. Pranab said institutions evolved by leaders like him should be strengthened instead of being destroyed. He listed his achievements like his role in evolving of Public Accounts Committee and protecting independence of Legislature from the influence of the Executive by establishing a separate cadre for the Central Legislative Assembly, now Parliament. Calling himself a student of history, he said Motilal's Swaraj Party acted as a disciplined assault force in the Legislative Assembly and he was credited with evolving the system of a Public Accounts Committee which is now one of the most effective watchdogs over executive in matters of money and finance. Mukherjee also received the first set of coins and postal stamps released at the function to commemorate the event.
For this i have used below logic:
int spanIndex = content.indexOf("<span");
spanIndex = content.indexOf(">", spanIndex);
int endspanndex = content.indexOf("</span>", spanIndex);
content = content.substring(spanIndex + 1, endspanndex);
and my Resultant out put is:
President Pranab pay great tributes to Motilal Nehru on occasion of
I have used Different HTMLParsers,but those are not working in case of j2me
can any one help me to get full description text? thanks .....
If you are using BlackBerry OS 5.0 or later you can use the BrowserField to parse HTML into a DOM document.
You may continue the same way as you propose with the rest of the string. Alternatively, a simple finite-state automaton would solve this. I have seen such solution in the moJab procect (you can download the sources here). In the mojab.xml package, there is a minimalistic XML parser designed for j2me. I mean it would parse your example as well. Take look at the sources, it's just three simple clases. It seems to be usable without modifications.
We can Extract the Text In Case of j2me as it is not suporting HTMLParsers,like this:
private String removeHtmlTags(String content) {
while (content.indexOf("<") != -1) {
int beginTag;
int endTag;
beginTag = content.indexOf("<");
endTag = content.indexOf(">");
if (beginTag == 0) {
content = content.substring(endTag
+ 1, content.length());
} else {
content = content.substring(0, beginTag) + content.substring(endTag
+ 1, content.length());
}
}
return content;
}
JSoup is a very popular library for extracting text from HTML documents. Here is one such example of the same.

Categories

Resources