Gettin only the Text and Image from a link

Gettin only the Text and Image from a link - java

As you see in the picture I have a WebView that I fill it with a Link got from News Websites.My question is that Is there a way to get the Text of this news and its Image only instead of getting the whole page? Because when the whole Web page is there it looks like a browser
Please take a look at my ScrenHot

Try to get the RSS feeds from news site like http://feeds.bbci.co.uk/news/rss.xml
or use News API -> https://newsapi.org/ .If you are using RSS feeds you will get list of news items in xml format which you have to parse in order to get title ,subject and image from it . Nearly each newspaper have similar type of RSS feeds so you don't different technique for different newspaper . In case of News API , you can get similar feeds in json format . Choose whatever format you like .

Related

web scraping jsoup java unable to scrape full information

I have an information to be scraped from a website. I could scrape it. But not all the information is being scraped. There is so much of data loss. The following images helps you further to understand :
I used Jsoup, connected it to URL and then extracted this particular data using the following code :
Document doc = Jsoup.connect("https://www.awattar.com/tariffs/hourly#").userAgent("Mozilla/17.0").get();
Elements durationCycle = doc.select("g.x.axis g.tick text");
But in the result, I couldn't find any of that related information at all. So I printed the whole document from the URL and it shows the following :
I could see the information when I download the page and read it as an input file but not when I connect directly to URL. But I want to connect it to URL. Is there any suggestion?
I hope my question is understandable. Let me know in case if it is not explanatory.

There is a request body limitation in Jsoup. you should use the maxBodySize parameter:
Document doc = Jsoup.connect("https://www.awattar.com/tariffs/hourly#").userAgent("Mozilla/17.0").maxBodySize(0).get();
"0" is no limit.

get content of RSS Feed article

I would like to get the content off an article from an RSS feed.
For example if you have the CNN RSS feed (http://rss.cnn.com/rss/edition.rss). You will get a list of articles. But each article has a content in it.
If you got this article (http://edition.cnn.com/2014/03/23/world/europe/uk-south-africa-dewani/index.html). How can I get the text
"The Department of Justice and...... carjacking and denies any
involvement in the killing."
Is it possible to get the content of the article using the RSS feeds or what is the best way to do it?

Does JSoup find ALL images

im trying to analyze different websites to find all of the images it contains.
Now for this im using Jsoup with the following code:
Elements imagePath = doc.select("[src]");
e.attr("abs:src")
Now when i run this on a domain name i get alot of images but if i try to run the same thing on a sub domain i get the same images
for instance the site http://www.example.com would preduce the same output as http://www.example.com/page1
Now my question is does JSoup find all images for all subsites to a domain or is it just random luck that it preduces the same output?

Are you updating your Document object? My guess is (since there is no valuable code provided) that you have parsed your domain into doc and you did not do the same for subdomain. Jsoup applies your select only to current document node and have nothing to do with subdomains/pages etc. (Since it doesn't even has to be a website).

I am using to jsoup to pull images from website url, but I want the page to load first is there anyway to do this?

The problem is that some of the urls I am trying to pull from have javascript slideshow containers that haven't loaded yet. I want to get the images in the slideshow but since this hasn't loaded yet it doesn't grab the element. Is there anyway to do this? This is what I have so far
Document doc = Jsoup.connect("http://news.nationalgeographic.com/news/2013/03/pictures/130316-gastric-brooding-frog-animals-weird-science-extinction-tedx/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ng%2FNews%2FNews_Main+%28National+Geographic+News+-+Main%29").get();
Elements jpg = doc.select("img[src$=.jpg]");

jsoup can't handle javascript, but you can use an additional library for this:
Parse JavaScript with jsoup
Trying to parse html hidden by javascript

How to read XML data in android

Hi i have been trying to read my XML file in my android but have been unsuccessful.
this is my XML file : http://collectionservice.byethost13.com/backup.XML
all i have to do is that there is a row tag in the document and have to show only the IDs in all the row tags inside the Listview of the first screen.
Can any body give me an example or something will be very thankful.
Hi i just want to show this XML file:
ID on the first screen listview from the XML and then on click on the specific id it goes to the next screen and show ID,Name,Phone,Department,What_Ever of that id.
Can anybody do give me a code or something have to give it to the client today and i am new to android will be very thankful to you.
Man have tried many links but no successes
Pretty please.

Here the official documentation, pretty clear and simple : https://developer.android.com/training/basics/network-ops/xml.html

I found that simple-xml is very easy to use to parse xml into objects.
The docs are well written so it could be helpful to you.

You can use SaxParser to parse XML file in android. Extend DefaultHandler class to read your tags.
Find a simple example here

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Gettin only the Text and Image from a link - java

Related

web scraping jsoup java unable to scrape full information

get content of RSS Feed article

Does JSoup find ALL images

I am using to jsoup to pull images from website url, but I want the page to load first is there anyway to do this?

How to read XML data in android

Categories

Resources