How to extract Dynamic text from a webpage

How to extract Dynamic text from a webpage - java

I want to get some text from webpage those are frequently changed.What are the technologies I cab use for this?,AS an example Currency rate that change everyday I want to extract from web page and want to save in DB,pls let me know any one knows about this,
thanxx

You can use JSoup to parse the HTML.
Example :
String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""
String linkOuterH = link.outerHtml();
// "<b>example</b>"
String linkInnerH = link.html(); // "<b>example</b>"
You can look for particular DIV , tag this way, Check example

Related

I cant use Jsoup on Java

I want to pull the four data I marked in the table in the picture and the following data with jsoup. But I couldn't find which HTML codes to use.
Here is my code and website
https://www.ilan.gov.tr/ilan/kategori/9/ihale-duyurulari
Document doc = Jsoup.connect("https://www.ilan.gov.tr/ilan/kategori/9/ihale-duyurulari").get();
//System.out.println(doc.outerHtml());
for(Element row: doc.select("search-results-content row ng-tns-c146-3")) {
final String title = row.select(".list-desc ng-tns-c152-3").text();
final String title1 = row.select(".col col-4 col-lg-4 col-border ng-star-inserted").text();
System.out.println(title);
}

Use JSoup to get all textual links

I'm using JSoup to grab content from web pages.
I want to get all the links on a page that have some contained text (it doesn't matter what the text is) just needs to be non-empty/image etc.
Example of links I want:
Link to Some Page
Since it contains the text "Link to Some Page"
Links I don't want:
<img src="someimage.jpg"/>
My code looks like this. How can I modify it to only get the first type of link?
Document document = // I get my document object
Elements linksOnPage = document.select("a[href]")
for (Element page : linksOnPage) {
String link = page.attr("abs:href");
// I do stuff with the link
}

You could do something like this.
It does it's job though it's probably not the fanciest solution out there.
Note: the function text() gets you a clean text so if there are any HTML code fragements inside it, it won't return them.
Document doc = // get the doc
Elements linksOnPage = document.select("a");
for (Element pageElem : linksOnPage){
String link = "";
if(pageElem.text().trim().equals(""))
continue;
// do smth with it
}

I am using this and it's working fine:
Document document = // I get my document object
Elements linksOnPage = document.select("a:matches(([^\\s]+))");
for (Element page : linksOnPage) {
String link = page.attr("abs:href");
// I do stuff with the link
}

Elements returns empty string

I am trying to scrape prices of a website with jSoup, but I only get an empty string.
I've tested my code with jSoup Online and I expect <meta itemprop="price" content="6,99"> to be printed when I use the following code:
Document doc = Jsoup.connect(URL).get();
Elements meta = doc.select("meta[itemprop=price]");
System.out.println("meta: " + meta.text());
price = meta.attr("content");
However, I just get an empty string and no error. What am I doing wrong here?
For the ones interested I am trying to scrape the price of this page

Try this:
Document doc = Jsoup.connect(URL).get();
Element meta = doc.select("meta[itemprop=price]").first();
System.out.println("meta: " + meta.text());
String price = meta.attr("content");

The webserver you are trying to access needs another user agent string to respond with the info you want. Try this:
Document doc = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Grabbing information from an html file

OK, I am trying to grab the data-title and href and assigning them to variables in java.
<tr class="pl-video yt-uix-tile " data-video-id="MBBWVgE0ewk" data-set-video-id="" data-title="Windows Command Line Tutorial - 1 - Introduction to the Command Prompt"><td class="pl-video-handle "></td><td class="pl-video-index"></td><td class="pl-video-thumbnail"><span class="pl-video-thumb ux-thumb-wrap contains-addto"><a href="/watch?v=MBBWVgE0ewk&index=1&list=PL6gx4Cwl9DGDV6SnbINlVUd0o2xT4JbMu"

If you don't mind including a dependency, there is a good library for this kind of things called jsoup.
String html = ...
Document doc = Jsoup.parse(html);
Element tr = doc.select("tr").first();
Element link = tr.select("a").first();
String dataTitle = tr.attr("data-title");
String href = link.attr("href");

jSoup get title from img tag

I have a scenario where I need to pull the title from a img tag like below.
<img alt="Bear" border="0" src="/images/teddy/5433.gif" title="Bear"/>
I was able to get the image url. But how do i get the title from the img tag.
From above title = "bear". I want to extract this.

Use Element#attr() to extract arbitrary element attributes.
Element img = selectItSomehow();
String title = img.attr("title");
// ...
See also:
Jsoup Cookbook - Extract attributes, text, and HTML from elements

String html = "<img alt='Bear' border='0' src='/images/teddy/5433.gif' title='Bear'/>";
Document doc = Jsoup.parse(html);
Element e = doc.select("img[title]").first();
String title = e.attr("title");
System.out.println(title);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to extract Dynamic text from a webpage - java

I want to get some text from webpage those are frequently changed.What are the technologies I cab use for this?,AS an example Currency rate that change everyday I want to extract from web page and want to save in DB,pls let me know any one knows about this, thanxx

Related

I cant use Jsoup on Java

Use JSoup to get all textual links

Elements returns empty string

Grabbing information from an html file

jSoup get title from img tag

Categories

Resources