How to do a Google Search with Jsoup - java

i have to use this Code to search in google and Show the result Link,but i dont know how to send My Query
String s="reza";
Document doc = Jsoup.connect("http://google.com/search?q=").userAgent("Mozilla").data(?????).get();
Elements titles = doc.select(".entrytitle");
//print all titles in main page
for(Element e: titles){
System.out.println("text: " +e.text());
System.out.println("html: "+ e.html());
}
//print all available links on page
Elements links = doc.select("a[href]");
for(Element l: links){
System.out.println("link: " +l.attr("abs:href"));
}
}
"reza" is my query that i want to search it in google
How can i use this method to search
my problem is sending query to Google search page

Change Jsoup.connect("http://google.com/search?q=") to Jsoup.connect("http://google.com/search?q=" + s).
And remove the .data(???).

Related

JSOUP - Extract Data Web dinamis

I tried to extract the price. Can anyone please help me? There is no output for the price and its weight ,, I've tried several ways but not out the results
Document doc = Jsoup.connect("https://www.jakmall.com/tokocamzone/mi-travel-charger-20a-output-fast-charging#9730928979371").get();
Elements rows = doc.getElementsByAttributeValue("class", "div[dp__price dp__price--2 format__money]");
System.out.println("rows.size() = " + rows.size());
String index = "";
for (Element span : rows) {
index = span.text();
}
System.out.println("index = " + index);
I've tried another way but I did not get the result. I was very curious but did not find it the right way
if you run this line of code above you will discover thtat there is no price ordiv[dp__price dp__price--2 format__money] DOM. There is only Javascript.
String d = doc.getElementsByClass("dp__header__info").outherHtml();
System.out.println(d);
Jsoup is not able to fetch the price because content is loaded dynamically after page loading. Consider using Selenium which more powerfull and supports JavaScript websites,

Use JSoup to get all textual links

I'm using JSoup to grab content from web pages.
I want to get all the links on a page that have some contained text (it doesn't matter what the text is) just needs to be non-empty/image etc.
Example of links I want:
Link to Some Page
Since it contains the text "Link to Some Page"
Links I don't want:
<img src="someimage.jpg"/>
My code looks like this. How can I modify it to only get the first type of link?
Document document = // I get my document object
Elements linksOnPage = document.select("a[href]")
for (Element page : linksOnPage) {
String link = page.attr("abs:href");
// I do stuff with the link
}
You could do something like this.
It does it's job though it's probably not the fanciest solution out there.
Note: the function text() gets you a clean text so if there are any HTML code fragements inside it, it won't return them.
Document doc = // get the doc
Elements linksOnPage = document.select("a");
for (Element pageElem : linksOnPage){
String link = "";
if(pageElem.text().trim().equals(""))
continue;
// do smth with it
}
I am using this and it's working fine:
Document document = // I get my document object
Elements linksOnPage = document.select("a:matches(([^\\s]+))");
for (Element page : linksOnPage) {
String link = page.attr("abs:href");
// I do stuff with the link
}

Elements returns empty string

I am trying to scrape prices of a website with jSoup, but I only get an empty string.
I've tested my code with jSoup Online and I expect <meta itemprop="price" content="6,99"> to be printed when I use the following code:
Document doc = Jsoup.connect(URL).get();
Elements meta = doc.select("meta[itemprop=price]");
System.out.println("meta: " + meta.text());
price = meta.attr("content");
However, I just get an empty string and no error. What am I doing wrong here?
For the ones interested I am trying to scrape the price of this page
Try this:
Document doc = Jsoup.connect(URL).get();
Element meta = doc.select("meta[itemprop=price]").first();
System.out.println("meta: " + meta.text());
String price = meta.attr("content");
The webserver you are trying to access needs another user agent string to respond with the info you want. Try this:
Document doc = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Real time web crawling using Jsoup

I have this web page https://rrtp.comed.com/pricing-table-today/ and from that I need to get the information about Time (Hour Ending) and Day-Ahead Hourly Price column alone. I tried with the following code,
Document doc = Jsoup.connect("https://rrtp.comed.com/pricing-table-today/").get();
for (Element table : doc.select("table.prices three-col")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 2) {
System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
}
}
}
but unfortunately I am unable to get the data I need.
Is there something wrong in the code..? or This page can't be crawled...?
Need some help
As I said in comment:
You should hit https://rrtp.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=20150717 because it's source from which data is loaded on the page you have pointed to.
Data under this link is not a valid html document (and this is why it's not working for you), but you can easily make it "quite" right.
All you have to do is first get the response and add <table>..</table> tags around it, then it's enough to parse it as html document.
Connection.Response response = Jsoup.connect("https://rrtp.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=20150717").execute();
Document doc = Jsoup.parse("<table>" + response.body() + "</table>");
for (Element element : doc.select("tr")) {
System.out.println(element.html());
}

adding text before and after a link jSoup

I've just stared learning Jsoup and the cookbook on their website but I'm just a bit stuck with addling text to an element I've parsed.
try{
Document doc = Jsoup.connect(url).get();
Element add = doc.prependText("a href") ;
Elements links = add.select("a[href]");
for (Element link : links) {
PrintStream sb = System.out.format("%n %s",link.attr("abs:href"));
System.out.print("<br>");
}
}
catch(Exception e){
System.out.print("error --> " + e);
}
Example run with google.com I get
http://www.google.ie/imghp?hl=en&tab=wi<br>
http://maps.google.ie/maps?hl=en&tab=wl<br>
https://play.google.com/?hl=en&tab=w8<br>
But I really want
<a href> http://www.google.ie/imghp?hl=en&tab=wi<br></a>
<a href> http://maps.google.ie/maps?hl=en&tab=wl<br></a>
<a href> https://play.google.com/?hl=en&tab=w8<br></a>
With this code I've gotten all the links off the page but I want to also get the and tags so I can them create my on webpage. I've tried adding a string and prepend text but just can't seem to get it right.
Thanks
with link.attr(...) you get the attribute value.
But you need the whole tag:
Document doc = Jsoup.connect(...).get();
for( Element e : doc.select("a[href]") ) // Select all 'a'-Tags with 'href' attribute
{
String wholeTag = e.toString(); // Get a string as the element is
/* No you you can use the html - in this example for a simple output */
System.out.println(wholeTag);
}

Categories

Resources